git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/5] Add struct strmap and associated utility functions
@ 2020-08-21 18:52 Elijah Newren via GitGitGadget
  2020-08-21 18:52 ` [PATCH 1/5] hashmap: add usage documentation explaining hashmap_free[_entries]() Elijah Newren via GitGitGadget
                   ` (6 more replies)
  0 siblings, 7 replies; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-08-21 18:52 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren

Here I introduce a new strmap type, which my new merge backed, merge-ort,
uses heavily. (I also made significant use of it in my changes to
diffcore-rename). This strmap type was based on Peff's proposal from a
couple years ago[1], but has additions that I made as I used it. I also
start the series off with a quick documentation improvement to hashmap.c to
differentiate between hashmap_free() and hashmap_free_entries(), since I
personally had difficulty understanding them and it affects how
strmap_clear()/strmap_free() are written.

The biggest issue I know about currently concerns the convenience functions
for a string->integer mapping. I wanted such a mapping that didn't need to
allocate an extra int but instead works by just type-casting the void*
pointer to an int instead. That all seems to work, but I needed a separate
name for that type, and the problem is that I couldn't come up with a good
one as you'll see in the last patch. Suggestions for better naming are very
much welcome. As are, of course, suggestions for other API or implementation
improvements.

[1] 
https://lore.kernel.org/git/20180906191203.GA26184@sigill.intra.peff.net/

Elijah Newren (5):
  hashmap: add usage documentation explaining hashmap_free[_entries]()
  strmap: new utility functions
  strmap: add more utility functions
  strmap: add strdup_strings option
  strmap: add functions facilitating use as a string->int map

 Makefile  |   1 +
 hashmap.h |  27 +++++++++++-
 strmap.c  | 126 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 strmap.h  | 123 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 275 insertions(+), 2 deletions(-)
 create mode 100644 strmap.c
 create mode 100644 strmap.h


base-commit: 675a4aaf3b226c0089108221b96559e0baae5de9
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-835%2Fnewren%2Fstrmap-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-835/newren/strmap-v1
Pull-Request: https://github.com/git/git/pull/835
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH 1/5] hashmap: add usage documentation explaining hashmap_free[_entries]()
  2020-08-21 18:52 [PATCH 0/5] Add struct strmap and associated utility functions Elijah Newren via GitGitGadget
@ 2020-08-21 18:52 ` Elijah Newren via GitGitGadget
  2020-08-21 19:22   ` Jeff King
  2020-08-21 18:52 ` [PATCH 2/5] strmap: new utility functions Elijah Newren via GitGitGadget
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-08-21 18:52 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

The existence of hashmap_free() and hashmap_free_entries() confused me,
and the docs weren't clear enough.  I had to consult other source code
examples and the implementation.  Add a brief note to clarify,
especially since hashmap_clear*() variants may be added in the future.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 hashmap.h | 27 +++++++++++++++++++++++++--
 1 file changed, 25 insertions(+), 2 deletions(-)

diff --git a/hashmap.h b/hashmap.h
index ef220de4c6..a2f4adc1b3 100644
--- a/hashmap.h
+++ b/hashmap.h
@@ -236,13 +236,36 @@ void hashmap_init(struct hashmap *map,
 void hashmap_free_(struct hashmap *map, ssize_t offset);
 
 /*
- * Frees a hashmap structure and allocated memory, leaves entries undisturbed
+ * Frees a hashmap structure and allocated memory for the table, but does not
+ * free the entries nor anything they point to.
+ *
+ * Usage note:
+ *
+ * Many callers will need to iterate over all entries and free the data each
+ * entry points to; in such a case, they can free the entry itself while at it.
+ * Thus, you might see:
+ *    hashmap_for_each_entry(map, hashmap_iter, e, hashmap_entry_name) {
+ *      free(e->somefield);
+ *      free(e);
+ *    }
+ *    hashmap_free(map);
+ * instead of
+ *    hashmap_for_each_entry(map, hashmap_iter, e, hashmap_entry_name) {
+ *      free(e->somefield);
+ *    }
+ *    hashmap_free_entries(map, struct my_entry_struct, hashmap_entry_name);
+ * to avoid the implicit extra loop over the entries.  However, if there are
+ * no special fields in your entry that need to be freed beyond the entry
+ * itself, it is probably simpler to avoid the explicit loop and just call
+ * hashmap_free_entries().
  */
 #define hashmap_free(map) hashmap_free_(map, -1)
 
 /*
  * Frees @map and all entries.  @type is the struct type of the entry
- * where @member is the hashmap_entry struct used to associate with @map
+ * where @member is the hashmap_entry struct used to associate with @map.
+ *
+ * See usage note above hashmap_free().
  */
 #define hashmap_free_entries(map, type, member) \
 	hashmap_free_(map, offsetof(type, member));
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH 2/5] strmap: new utility functions
  2020-08-21 18:52 [PATCH 0/5] Add struct strmap and associated utility functions Elijah Newren via GitGitGadget
  2020-08-21 18:52 ` [PATCH 1/5] hashmap: add usage documentation explaining hashmap_free[_entries]() Elijah Newren via GitGitGadget
@ 2020-08-21 18:52 ` Elijah Newren via GitGitGadget
  2020-08-21 19:48   ` Jeff King
  2020-08-21 18:52 ` [PATCH 3/5] strmap: add more " Elijah Newren via GitGitGadget
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-08-21 18:52 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Add strmap as a new struct and associated utility functions,
specifically for hashmaps that map strings to some value.  The API is
taken directly from Peff's proposal at
https://lore.kernel.org/git/20180906191203.GA26184@sigill.intra.peff.net/

Peff only included the header, not the implementation, so it isn't clear what
the structure was he was going to use for the hash entries.  Instead of having
my str_entry struct have three subfields (the hashmap_entry, the string, and
the void* value), I made it only have two -- the hashmap_entry and a
string_list_item, for two reasons:

  1) a strmap is often the data structure we want where string_list has
     been used in the past.  Using the same building block for
     individual entries in both makes it easier to adopt and reuse
     parts of the string_list API in strmap.

  2) In some cases, after doing lots of other work, I want to be able
     to iterate over the items in my strmap in sorted order.  hashmap
     obviously doesn't support that, but I wanted to be able to export
     the strmap to a string_list easily and then use its functions.
     (Note: I do not need the data structure to both be sorted and have
     efficient lookup at all times.  If I did, I might use a B-tree
     instead, as suggested by brian in response to Peff in the thread
     noted above.  In my case, most strmaps will never need sorting, but
     in one special case at the very end of a bunch of other work I want
     to iterate over the items in sorted order without doing any more
     lookups afterward.)

Also, I removed the STRMAP_INIT macro, since it cannot be used to
correctly initialize a strmap; the underlying hashmap needs a call to
hashmap_init() to allocate the hash table first.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 Makefile |  1 +
 strmap.c | 81 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 strmap.h | 47 ++++++++++++++++++++++++++++++++
 3 files changed, 129 insertions(+)
 create mode 100644 strmap.c
 create mode 100644 strmap.h

diff --git a/Makefile b/Makefile
index 65f8cfb236..0da15a9ee5 100644
--- a/Makefile
+++ b/Makefile
@@ -988,6 +988,7 @@ LIB_OBJS += strbuf.o
 LIB_OBJS += strvec.o
 LIB_OBJS += streaming.o
 LIB_OBJS += string-list.o
+LIB_OBJS += strmap.o
 LIB_OBJS += sub-process.o
 LIB_OBJS += submodule-config.o
 LIB_OBJS += submodule.o
diff --git a/strmap.c b/strmap.c
new file mode 100644
index 0000000000..1c9fdb3b1e
--- /dev/null
+++ b/strmap.c
@@ -0,0 +1,81 @@
+#include "git-compat-util.h"
+#include "strmap.h"
+
+static int cmp_str_entry(const void *hashmap_cmp_fn_data,
+			 const struct hashmap_entry *entry1,
+			 const struct hashmap_entry *entry2,
+			 const void *keydata)
+{
+	const struct str_entry *e1, *e2;
+
+	e1 = container_of(entry1, const struct str_entry, ent);
+	e2 = container_of(entry2, const struct str_entry, ent);
+	return strcmp(e1->item.string, e2->item.string);
+}
+
+static struct str_entry *find_str_entry(struct strmap *map,
+					const char *str)
+{
+	struct str_entry entry;
+	hashmap_entry_init(&entry.ent, strhash(str));
+	entry.item.string = (char *)str;
+	return hashmap_get_entry(&map->map, &entry, ent, NULL);
+}
+
+void strmap_init(struct strmap *map)
+{
+	hashmap_init(&map->map, cmp_str_entry, NULL, 0);
+}
+
+void strmap_clear(struct strmap *map, int free_util)
+{
+	struct hashmap_iter iter;
+	struct str_entry *e;
+
+	if (!map)
+		return;
+
+	hashmap_for_each_entry(&map->map, &iter, e, ent /* member name */) {
+		free(e->item.string);
+		if (free_util)
+			free(e->item.util);
+	}
+	hashmap_free_entries(&map->map, struct str_entry, ent);
+	strmap_init(map);
+}
+
+/*
+ * Insert "str" into the map, pointing to "data". A copy of "str" is made, so
+ * it does not need to persist after the this function is called.
+ *
+ * If an entry for "str" already exists, its data pointer is overwritten, and
+ * the original data pointer returned. Otherwise, returns NULL.
+ */
+void *strmap_put(struct strmap *map, const char *str, void *data)
+{
+	struct str_entry *entry = find_str_entry(map, str);
+	void *old = NULL;
+
+	if (entry) {
+		old = entry->item.util;
+		entry->item.util = data;
+	} else {
+		entry = xmalloc(sizeof(*entry));
+		hashmap_entry_init(&entry->ent, strhash(str));
+		entry->item.string = strdup(str);
+		entry->item.util = data;
+		hashmap_add(&map->map, &entry->ent);
+	}
+	return old;
+}
+
+void *strmap_get(struct strmap *map, const char *str)
+{
+	struct str_entry *entry = find_str_entry(map, str);
+	return entry ? entry->item.util : NULL;
+}
+
+int strmap_contains(struct strmap *map, const char *str)
+{
+	return find_str_entry(map, str) != NULL;
+}
diff --git a/strmap.h b/strmap.h
new file mode 100644
index 0000000000..eb5807f6fa
--- /dev/null
+++ b/strmap.h
@@ -0,0 +1,47 @@
+#ifndef STRMAP_H
+#define STRMAP_H
+
+#include "hashmap.h"
+#include "string-list.h"
+
+struct strmap {
+	struct hashmap map;
+};
+
+struct str_entry {
+	struct hashmap_entry ent;
+	struct string_list_item item;
+};
+
+/*
+ * Initialize an empty strmap
+ */
+void strmap_init(struct strmap *map);
+
+/*
+ * Remove all entries from the map, releasing any allocated resources.
+ */
+void strmap_clear(struct strmap *map, int free_values);
+
+/*
+ * Insert "str" into the map, pointing to "data". A copy of "str" is made, so
+ * it does not need to persist after the this function is called.
+ *
+ * If an entry for "str" already exists, its data pointer is overwritten, and
+ * the original data pointer returned. Otherwise, returns NULL.
+ */
+void *strmap_put(struct strmap *map, const char *str, void *data);
+
+/*
+ * Return the data pointer mapped by "str", or NULL if the entry does not
+ * exist.
+ */
+void *strmap_get(struct strmap *map, const char *str);
+
+/*
+ * Return non-zero iff "str" is present in the map. This differs from
+ * strmap_get() in that it can distinguish entries with a NULL data pointer.
+ */
+int strmap_contains(struct strmap *map, const char *str);
+
+#endif /* STRMAP_H */
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH 3/5] strmap: add more utility functions
  2020-08-21 18:52 [PATCH 0/5] Add struct strmap and associated utility functions Elijah Newren via GitGitGadget
  2020-08-21 18:52 ` [PATCH 1/5] hashmap: add usage documentation explaining hashmap_free[_entries]() Elijah Newren via GitGitGadget
  2020-08-21 18:52 ` [PATCH 2/5] strmap: new utility functions Elijah Newren via GitGitGadget
@ 2020-08-21 18:52 ` Elijah Newren via GitGitGadget
  2020-08-21 19:58   ` Jeff King
  2020-08-21 18:52 ` [PATCH 4/5] strmap: add strdup_strings option Elijah Newren via GitGitGadget
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-08-21 18:52 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

This adds a number of additional convienence functions I want/need:
  * strmap_empty()
  * strmap_get_size()
  * strmap_remove()
  * strmap_for_each_entry()
  * strmap_free()
  * strmap_get_item()

I suspect the first four are self-explanatory.

strmap_free() differs from strmap_clear() in that the data structure is
not reusable after it is called; strmap_clear() is not sufficient for
the API because without strmap_free() we will leak memory.

strmap_get_item() is similar to strmap_get() except that instead of just
returning the void* value that the string maps to, it returns the
string_list_item that contains both the string and the void* value (or
NULL if the string isn't in the map).  This is helpful because it avoids
multiple lookups, e.g. in some cases a caller would need to call:
  * strmap_contains() to check that the map has an entry for the string
  * strmap_get() to get the void* value
  * <do some work to update the value>
  * strmap_put() to update/overwrite the value
If the void* pointer returned really is a pointer, then the last step is
unnecessary, but if the void* pointer is just cast to an integer then
strmap_put() will be needed.  In contrast, one can call strmap_get_item()
and then:
  * check if the string was in the map by whether the pointer is NULL
  * access the value via item->util
  * directly update item->util
meaning that we can replace two or three hash table lookups with one.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 strmap.c | 35 ++++++++++++++++++++++++++++++-----
 strmap.h | 43 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 73 insertions(+), 5 deletions(-)

diff --git a/strmap.c b/strmap.c
index 1c9fdb3b1e..a4bfffcd8b 100644
--- a/strmap.c
+++ b/strmap.c
@@ -27,7 +27,7 @@ void strmap_init(struct strmap *map)
 	hashmap_init(&map->map, cmp_str_entry, NULL, 0);
 }
 
-void strmap_clear(struct strmap *map, int free_util)
+void strmap_free(struct strmap *map, int free_util)
 {
 	struct hashmap_iter iter;
 	struct str_entry *e;
@@ -35,12 +35,19 @@ void strmap_clear(struct strmap *map, int free_util)
 	if (!map)
 		return;
 
-	hashmap_for_each_entry(&map->map, &iter, e, ent /* member name */) {
-		free(e->item.string);
-		if (free_util)
-			free(e->item.util);
+	if (free_util) {
+		hashmap_for_each_entry(&map->map, &iter, e, ent) {
+			free(e->item.string);
+			if (free_util)
+				free(e->item.util);
+		}
 	}
 	hashmap_free_entries(&map->map, struct str_entry, ent);
+}
+
+void strmap_clear(struct strmap *map, int free_util)
+{
+	strmap_free(map, free_util);
 	strmap_init(map);
 }
 
@@ -69,6 +76,13 @@ void *strmap_put(struct strmap *map, const char *str, void *data)
 	return old;
 }
 
+struct string_list_item *strmap_get_item(struct strmap *map,
+					 const char *str)
+{
+	struct str_entry *entry = find_str_entry(map, str);
+	return entry ? &entry->item : NULL;
+}
+
 void *strmap_get(struct strmap *map, const char *str)
 {
 	struct str_entry *entry = find_str_entry(map, str);
@@ -79,3 +93,14 @@ int strmap_contains(struct strmap *map, const char *str)
 {
 	return find_str_entry(map, str) != NULL;
 }
+
+void strmap_remove(struct strmap *map, const char *str, int free_util)
+{
+	struct str_entry entry, *ret;
+	hashmap_entry_init(&entry.ent, strhash(str));
+	entry.item.string = (char *)str;
+	ret = hashmap_remove_entry(&map->map, &entry, ent, NULL);
+	if (ret && free_util)
+		free(ret->item.util);
+	free(ret);
+}
diff --git a/strmap.h b/strmap.h
index eb5807f6fa..45d0a4f714 100644
--- a/strmap.h
+++ b/strmap.h
@@ -21,6 +21,11 @@ void strmap_init(struct strmap *map);
 /*
  * Remove all entries from the map, releasing any allocated resources.
  */
+void strmap_free(struct strmap *map, int free_values);
+
+/*
+ * Same as calling strmap_free() followed by strmap_init().
+ */
 void strmap_clear(struct strmap *map, int free_values);
 
 /*
@@ -32,6 +37,12 @@ void strmap_clear(struct strmap *map, int free_values);
  */
 void *strmap_put(struct strmap *map, const char *str, void *data);
 
+/*
+ * Return the string_list_item mapped by "str", or NULL if there is not such
+ * an item in map.
+ */
+struct string_list_item *strmap_get_item(struct strmap *map, const char *str);
+
 /*
  * Return the data pointer mapped by "str", or NULL if the entry does not
  * exist.
@@ -44,4 +55,36 @@ void *strmap_get(struct strmap *map, const char *str);
  */
 int strmap_contains(struct strmap *map, const char *str);
 
+/*
+ * Remove the given entry from the strmap.  If the string isn't in the
+ * strmap, the map is not altered.
+ */
+void strmap_remove(struct strmap *map, const char *str, int free_value);
+
+/*
+ * Return whether the strmap is empty.
+ */
+static inline int strmap_empty(struct strmap *map)
+{
+	return hashmap_get_size(&map->map) == 0;
+}
+
+/*
+ * Return how many entries the strmap has.
+ */
+static inline unsigned int strmap_get_size(struct strmap *map)
+{
+	return hashmap_get_size(&map->map);
+}
+
+/*
+ * iterate through @map using @iter, @var is a pointer to a type str_entry
+ */
+#define strmap_for_each_entry(mystrmap, iter, var)	\
+	for (var = hashmap_iter_first_entry_offset(&(mystrmap)->map, iter, \
+						   OFFSETOF_VAR(var, ent)); \
+		var; \
+		var = hashmap_iter_next_entry_offset(iter, \
+						OFFSETOF_VAR(var, ent)))
+
 #endif /* STRMAP_H */
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH 4/5] strmap: add strdup_strings option
  2020-08-21 18:52 [PATCH 0/5] Add struct strmap and associated utility functions Elijah Newren via GitGitGadget
                   ` (2 preceding siblings ...)
  2020-08-21 18:52 ` [PATCH 3/5] strmap: add more " Elijah Newren via GitGitGadget
@ 2020-08-21 18:52 ` Elijah Newren via GitGitGadget
  2020-08-21 20:01   ` Jeff King
  2020-08-21 18:52 ` [PATCH 5/5] strmap: add functions facilitating use as a string->int map Elijah Newren via GitGitGadget
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-08-21 18:52 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Just as it is sometimes useful for string_list to duplicate and take
ownership of memory management of the strings it contains, the same is
sometimes true for strmaps as well.  Add the same flag from string_list
to strmap.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 strmap.c | 23 ++++++++++++++++-------
 strmap.h |  9 +++++----
 2 files changed, 21 insertions(+), 11 deletions(-)

diff --git a/strmap.c b/strmap.c
index a4bfffcd8b..03eb6af45d 100644
--- a/strmap.c
+++ b/strmap.c
@@ -22,9 +22,10 @@ static struct str_entry *find_str_entry(struct strmap *map,
 	return hashmap_get_entry(&map->map, &entry, ent, NULL);
 }
 
-void strmap_init(struct strmap *map)
+void strmap_init(struct strmap *map, int strdup_strings)
 {
 	hashmap_init(&map->map, cmp_str_entry, NULL, 0);
+	map->strdup_strings = strdup_strings;
 }
 
 void strmap_free(struct strmap *map, int free_util)
@@ -35,9 +36,10 @@ void strmap_free(struct strmap *map, int free_util)
 	if (!map)
 		return;
 
-	if (free_util) {
+	if (map->strdup_strings || free_util) {
 		hashmap_for_each_entry(&map->map, &iter, e, ent) {
-			free(e->item.string);
+			if (map->strdup_strings)
+				free(e->item.string);
 			if (free_util)
 				free(e->item.util);
 		}
@@ -48,12 +50,11 @@ void strmap_free(struct strmap *map, int free_util)
 void strmap_clear(struct strmap *map, int free_util)
 {
 	strmap_free(map, free_util);
-	strmap_init(map);
+	strmap_init(map, map->strdup_strings);
 }
 
 /*
- * Insert "str" into the map, pointing to "data". A copy of "str" is made, so
- * it does not need to persist after the this function is called.
+ * Insert "str" into the map, pointing to "data".
  *
  * If an entry for "str" already exists, its data pointer is overwritten, and
  * the original data pointer returned. Otherwise, returns NULL.
@@ -69,7 +70,13 @@ void *strmap_put(struct strmap *map, const char *str, void *data)
 	} else {
 		entry = xmalloc(sizeof(*entry));
 		hashmap_entry_init(&entry->ent, strhash(str));
-		entry->item.string = strdup(str);
+		/*
+		 * We won't modify entry->item.string so it really should be
+		 * const, but changing string_list_item to use a const char *
+		 * is a bit too big of a change at this point.
+		 */
+		entry->item.string =
+			map->strdup_strings ? xstrdup(str) : (char *)str;
 		entry->item.util = data;
 		hashmap_add(&map->map, &entry->ent);
 	}
@@ -100,6 +107,8 @@ void strmap_remove(struct strmap *map, const char *str, int free_util)
 	hashmap_entry_init(&entry.ent, strhash(str));
 	entry.item.string = (char *)str;
 	ret = hashmap_remove_entry(&map->map, &entry, ent, NULL);
+	if (map->strdup_strings)
+		free(ret->item.string);
 	if (ret && free_util)
 		free(ret->item.util);
 	free(ret);
diff --git a/strmap.h b/strmap.h
index 45d0a4f714..28a98c5a4b 100644
--- a/strmap.h
+++ b/strmap.h
@@ -6,6 +6,7 @@
 
 struct strmap {
 	struct hashmap map;
+	unsigned int strdup_strings:1;
 };
 
 struct str_entry {
@@ -14,9 +15,10 @@ struct str_entry {
 };
 
 /*
- * Initialize an empty strmap
+ * Initialize the members of the strmap, set `strdup_strings`
+ * member according to the value of the second parameter.
  */
-void strmap_init(struct strmap *map);
+void strmap_init(struct strmap *map, int strdup_strings);
 
 /*
  * Remove all entries from the map, releasing any allocated resources.
@@ -29,8 +31,7 @@ void strmap_free(struct strmap *map, int free_values);
 void strmap_clear(struct strmap *map, int free_values);
 
 /*
- * Insert "str" into the map, pointing to "data". A copy of "str" is made, so
- * it does not need to persist after the this function is called.
+ * Insert "str" into the map, pointing to "data".
  *
  * If an entry for "str" already exists, its data pointer is overwritten, and
  * the original data pointer returned. Otherwise, returns NULL.
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH 5/5] strmap: add functions facilitating use as a string->int map
  2020-08-21 18:52 [PATCH 0/5] Add struct strmap and associated utility functions Elijah Newren via GitGitGadget
                   ` (3 preceding siblings ...)
  2020-08-21 18:52 ` [PATCH 4/5] strmap: add strdup_strings option Elijah Newren via GitGitGadget
@ 2020-08-21 18:52 ` Elijah Newren via GitGitGadget
  2020-08-21 20:10   ` Jeff King
  2020-08-21 20:16 ` [PATCH 0/5] Add struct strmap and associated utility functions Jeff King
  2020-10-13  0:40 ` [PATCH v2 00/10] " Elijah Newren via GitGitGadget
  6 siblings, 1 reply; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-08-21 18:52 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Although strmap could be used as a string->int map, one either had to
allocate an int for every entry and then deallocate later, or one had to
do a bunch of casting between (void*) and (intptr_t).

Add some special functions that do the casting.  Also, rename put->set
for such wrapper functions since 'put' implied there may be some
deallocation needed if the string was already found in the map, which
isn't the case when we're storing an int value directly in the void*
slot instead of using the void* slot as a pointer to data.

A note on the name: strintmap looks and sounds pretty lame to me, but
after trying to come up with something better and having no luck, I
figured I'd just go with it for a while and then at some point some
better and obvious name would strike me and I could replace it.  Several
months later, I still don't have a better name.  Hopefully someone else
has one.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 strmap.c | 11 +++++++++++
 strmap.h | 32 ++++++++++++++++++++++++++++++++
 2 files changed, 43 insertions(+)

diff --git a/strmap.c b/strmap.c
index 03eb6af45d..cbb99f4030 100644
--- a/strmap.c
+++ b/strmap.c
@@ -113,3 +113,14 @@ void strmap_remove(struct strmap *map, const char *str, int free_util)
 		free(ret->item.util);
 	free(ret);
 }
+
+void strintmap_incr(struct strmap *map, const char *str, intptr_t amt)
+{
+	struct str_entry *entry = find_str_entry(map, str);
+	if (entry) {
+		intptr_t *whence = (intptr_t*)&entry->item.util;
+		*whence += amt;
+	}
+	else
+		strintmap_set(map, str, amt);
+}
diff --git a/strmap.h b/strmap.h
index 28a98c5a4b..5d9dd3ef58 100644
--- a/strmap.h
+++ b/strmap.h
@@ -88,4 +88,36 @@ static inline unsigned int strmap_get_size(struct strmap *map)
 		var = hashmap_iter_next_entry_offset(iter, \
 						OFFSETOF_VAR(var, ent)))
 
+/*
+ * Helper functions for using strmap as map of string -> int, using the void*
+ * field to store the int instead of allocating an int and having the void*
+ * member point to the allocated int.
+ */
+
+static inline int strintmap_get(struct strmap *map, const char *str,
+				int default_value)
+{
+	struct string_list_item *result = strmap_get_item(map, str);
+	if (!result)
+		return default_value;
+	return (intptr_t)result->util;
+}
+
+static inline void strintmap_set(struct strmap *map, const char *str, intptr_t v)
+{
+	strmap_put(map, str, (void *)v);
+}
+
+void strintmap_incr(struct strmap *map, const char *str, intptr_t amt);
+
+static inline void strintmap_clear(struct strmap *map)
+{
+	strmap_clear(map, 0);
+}
+
+static inline void strintmap_free(struct strmap *map)
+{
+	strmap_free(map, 0);
+}
+
 #endif /* STRMAP_H */
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH 1/5] hashmap: add usage documentation explaining hashmap_free[_entries]()
  2020-08-21 18:52 ` [PATCH 1/5] hashmap: add usage documentation explaining hashmap_free[_entries]() Elijah Newren via GitGitGadget
@ 2020-08-21 19:22   ` Jeff King
  0 siblings, 0 replies; 144+ messages in thread
From: Jeff King @ 2020-08-21 19:22 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget; +Cc: git, Elijah Newren

On Fri, Aug 21, 2020 at 06:52:25PM +0000, Elijah Newren via GitGitGadget wrote:

> From: Elijah Newren <newren@gmail.com>
> 
> The existence of hashmap_free() and hashmap_free_entries() confused me,
> and the docs weren't clear enough.  I had to consult other source code
> examples and the implementation.  Add a brief note to clarify,
> especially since hashmap_clear*() variants may be added in the future.

Thanks, I think this is worth doing and the text looks clear and correct
to me.

>  /*
> - * Frees a hashmap structure and allocated memory, leaves entries undisturbed
> + * Frees a hashmap structure and allocated memory for the table, but does not
> + * free the entries nor anything they point to.
> + *
> + * Usage note:
> + *
> + * Many callers will need to iterate over all entries and free the data each
> + * entry points to; in such a case, they can free the entry itself while at it.
> + * Thus, you might see:
> + *    hashmap_for_each_entry(map, hashmap_iter, e, hashmap_entry_name) {
> + *      free(e->somefield);
> + *      free(e);
> + *    }
> + *    hashmap_free(map);
> + * instead of
> + *    hashmap_for_each_entry(map, hashmap_iter, e, hashmap_entry_name) {
> + *      free(e->somefield);
> + *    }
> + *    hashmap_free_entries(map, struct my_entry_struct, hashmap_entry_name);
> + * to avoid the implicit extra loop over the entries.  However, if there are
> + * no special fields in your entry that need to be freed beyond the entry
> + * itself, it is probably simpler to avoid the explicit loop and just call
> + * hashmap_free_entries().

A minor nit, but a blank line between the code snippets and the text
might make it a little more readable.

-Peff

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH 2/5] strmap: new utility functions
  2020-08-21 18:52 ` [PATCH 2/5] strmap: new utility functions Elijah Newren via GitGitGadget
@ 2020-08-21 19:48   ` Jeff King
  0 siblings, 0 replies; 144+ messages in thread
From: Jeff King @ 2020-08-21 19:48 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget; +Cc: git, Elijah Newren

On Fri, Aug 21, 2020 at 06:52:26PM +0000, Elijah Newren via GitGitGadget wrote:

> From: Elijah Newren <newren@gmail.com>
> 
> Add strmap as a new struct and associated utility functions,
> specifically for hashmaps that map strings to some value.  The API is
> taken directly from Peff's proposal at
> https://lore.kernel.org/git/20180906191203.GA26184@sigill.intra.peff.net/

Uh oh. You are encouraging me in the belief that I can send half-baked
ideas to the list and somebody will come along and implement them for
me. ;)

> Peff only included the header, not the implementation, so it isn't clear what
> the structure was he was going to use for the hash entries.  Instead of having
> my str_entry struct have three subfields (the hashmap_entry, the string, and
> the void* value), I made it only have two -- the hashmap_entry and a
> string_list_item, for two reasons:

I'd probably have done:

  struct strmap_entry {
	struct hashmap_entry ent;
	void *value;
	char key[FLEX_ALLOC];
  };

That saves 8 bytes (plus malloc overhead)per item, plus avoids an extra
pointer-chase for each item we consider when looking up.

>   1) a strmap is often the data structure we want where string_list has
>      been used in the past.  Using the same building block for
>      individual entries in both makes it easier to adopt and reuse
>      parts of the string_list API in strmap.

I can see that there might be some value in being able to interchange
the items for code that's expecting a string_list_item. But I have to
wonder if the potential for confusion is worth it. I.e., should that
code really be expecting a raw string pointer (possibly with a separate
void pointer, or even better an actual typed pointer).

I'll keep an eye out as I read the rest of the series for code which
uses this.

>   2) In some cases, after doing lots of other work, I want to be able
>      to iterate over the items in my strmap in sorted order.  hashmap
>      obviously doesn't support that, but I wanted to be able to export
>      the strmap to a string_list easily and then use its functions.
>      (Note: I do not need the data structure to both be sorted and have
>      efficient lookup at all times.  If I did, I might use a B-tree
>      instead, as suggested by brian in response to Peff in the thread
>      noted above.  In my case, most strmaps will never need sorting, but
>      in one special case at the very end of a bunch of other work I want
>      to iterate over the items in sorted order without doing any more
>      lookups afterward.)

Hmm. Likewise, I'll keep an eye open for how this works in practice. I
do suspect that a B-tree might be a better solution here, but
implementing it is non-trivial, and most callers don't care about this
property.

If the interim solution is to just dump it to a string_list and sort
that, that's really not that bad, assuming it just happens once after
we've added everything. I'm not sure there's that big a benefit to using
string_list_item internally, since presumably that conversion needs to
write a whole new array of string_list_items anyway.

> Also, I removed the STRMAP_INIT macro, since it cannot be used to
> correctly initialize a strmap; the underlying hashmap needs a call to
> hashmap_init() to allocate the hash table first.

Since access to the underlying hashmap happens through strmap functions,
they could lazily initialize it. That's how oidmap works.

> diff --git a/strmap.c b/strmap.c
> new file mode 100644
> index 0000000000..1c9fdb3b1e
> --- /dev/null
> +++ b/strmap.c
> @@ -0,0 +1,81 @@
> +#include "git-compat-util.h"
> +#include "strmap.h"
> +
> +static int cmp_str_entry(const void *hashmap_cmp_fn_data,
> +			 const struct hashmap_entry *entry1,
> +			 const struct hashmap_entry *entry2,
> +			 const void *keydata)
> +{
> +	const struct str_entry *e1, *e2;
> +
> +	e1 = container_of(entry1, const struct str_entry, ent);
> +	e2 = container_of(entry2, const struct str_entry, ent);
> +	return strcmp(e1->item.string, e2->item.string);
> +}

If you do go the FLEX_ALLOC route, obviously lookups won't want to
allocate a str_entry struct for the lookup key. You'd use keydata there
(and prefer it over looking at entry2 at all). See remotes_hash_cmp()
for an example.

> +static struct str_entry *find_str_entry(struct strmap *map,
> +					const char *str)
> +{
> +	struct str_entry entry;
> +	hashmap_entry_init(&entry.ent, strhash(str));
> +	entry.item.string = (char *)str;
> +	return hashmap_get_entry(&map->map, &entry, ent, NULL);
> +}

Casting away constness here is awkward. It could likewise benefit from
using keydata, so you wouldn't need to create a temporary
string_list_item (which is where the non-constness comes from).

> +void strmap_clear(struct strmap *map, int free_util)
> +{
> +	struct hashmap_iter iter;
> +	struct str_entry *e;
> +
> +	if (!map)
> +		return;

In a lazy-init world, this becomes:

  if (!map || !map->map.table)

Of course it would be better still if the hashmap code learned to do the
lazy-init stuff itself.

> +	hashmap_for_each_entry(&map->map, &iter, e, ent /* member name */) {
> +		free(e->item.string);
> +		if (free_util)
> +			free(e->item.util);
> +	}
> +	hashmap_free_entries(&map->map, struct str_entry, ent);

With a flex-alloc struct, you can avoid the extra string free. But I
guess you still wouldn't avoid the loop if you want to support
free_entries().

I wonder if it would make the API simpler if the struct knew whether it
owned the void pointer values or not. Then you'd do:

  struct strmap foo = { .free_values = 1 };
  ...
  strmap_put(&foo, "key", value);
  ...
  strmap_clear(&foo);

and wouldn't have to remember to do the right thing at clear-time. It is
a little less flexible (e.g., if you transfer ownership after a certain
point in the code), but I wonder if any callers actually need that (and
they could always set the free_values flag then).

> +/*
> + * Insert "str" into the map, pointing to "data". A copy of "str" is made, so
> + * it does not need to persist after the this function is called.
> + *
> + * If an entry for "str" already exists, its data pointer is overwritten, and
> + * the original data pointer returned. Otherwise, returns NULL.
> + */
> +void *strmap_put(struct strmap *map, const char *str, void *data)

Minor, but IMHO we should avoid copying the docstrings to the
implementation, since it gives two places that people have to remember
to update if the API changes.

> +void *strmap_put(struct strmap *map, const char *str, void *data)
> +{
> +	struct str_entry *entry = find_str_entry(map, str);
> +	void *old = NULL;

In a lazy-init world, this is:

  if (!map->map.table) {
	strmap_init(map);
	entry = NULL;
  } else {
        entry = find_str_entry(map, str);
  }

(or just call find_str_entry() in both cases and let it realize there's
nothing to find).

> +	if (entry) {
> +		old = entry->item.util;
> +		entry->item.util = data;
> +	} else {
> +		entry = xmalloc(sizeof(*entry));
> +		hashmap_entry_init(&entry->ent, strhash(str));
> +		entry->item.string = strdup(str);
> +		entry->item.util = data;
> +		hashmap_add(&map->map, &entry->ent);
> +	}

And in a flex-alloc world, this second block is:

  FLEX_ALLOC_STR(entry, key, str);
  hashmap_entry_init(&entry->ent, strhash(str));
  entry->value = data;
  hashmap_add(&map->map, &entry->ent);

> +void *strmap_get(struct strmap *map, const char *str)
> +{
> +	struct str_entry *entry = find_str_entry(map, str);
> +	return entry ? entry->item.util : NULL;
> +}

In a lazy world, this is:

  if (!map->map.table)
          return NULL;

> +int strmap_contains(struct strmap *map, const char *str)
> +{
> +	return find_str_entry(map, str) != NULL;
> +}

And likewise:

  if (!map->map.table)
          return NULL;

It might actually be easier to stick that in find_str_entry().

The rest of it all looked good to me.

-Peff

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH 3/5] strmap: add more utility functions
  2020-08-21 18:52 ` [PATCH 3/5] strmap: add more " Elijah Newren via GitGitGadget
@ 2020-08-21 19:58   ` Jeff King
  0 siblings, 0 replies; 144+ messages in thread
From: Jeff King @ 2020-08-21 19:58 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget; +Cc: git, Elijah Newren

On Fri, Aug 21, 2020 at 06:52:27PM +0000, Elijah Newren via GitGitGadget wrote:

> From: Elijah Newren <newren@gmail.com>
> 
> This adds a number of additional convienence functions I want/need:
>   * strmap_empty()
>   * strmap_get_size()
>   * strmap_remove()
>   * strmap_for_each_entry()
>   * strmap_free()
>   * strmap_get_item()
> 
> I suspect the first four are self-explanatory.

Yup, all make sense. We might also want real iterators rather than
strmap_for_each_entry(), which can be a bit more convenient given the
lack of lambdas in C. But I'd be happy to wait until a caller arises.

> strmap_free() differs from strmap_clear() in that the data structure is
> not reusable after it is called; strmap_clear() is not sufficient for
> the API because without strmap_free() we will leak memory.

Hmm, I missed in the previous function that strmap_clear() is actually
leaving allocated memory. I think this is bad, because it's unlike most
of our other data structure clear() functions.

We could work around it with the lazy-init stuff I mentioned in my last
email (i.e., _don't_ strmap_init() at the end of strmap_clear(), and
just let strmap_put() take care of initializing if somebody actually
adds something again).

But IMHO this is a sign that we should be fixing hashmap() to work like
that, too.

> strmap_get_item() is similar to strmap_get() except that instead of just
> returning the void* value that the string maps to, it returns the
> string_list_item that contains both the string and the void* value (or
> NULL if the string isn't in the map).  This is helpful because it avoids
> multiple lookups, e.g. in some cases a caller would need to call:
>   * strmap_contains() to check that the map has an entry for the string
>   * strmap_get() to get the void* value
>   * <do some work to update the value>
>   * strmap_put() to update/overwrite the value

That makes sense. If you follow my suggestion to drop string_list_item,
then it would be OK to return the whole str_entry. (I forgot to mention
in the last patch, but perhaps strmap_entry would be a more distinctive
name).

> diff --git a/strmap.h b/strmap.h
> index eb5807f6fa..45d0a4f714 100644
> --- a/strmap.h
> +++ b/strmap.h
> @@ -21,6 +21,11 @@ void strmap_init(struct strmap *map);
>  /*
>   * Remove all entries from the map, releasing any allocated resources.
>   */
> +void strmap_free(struct strmap *map, int free_values);
> +
> +/*
> + * Same as calling strmap_free() followed by strmap_init().
> + */
>  void strmap_clear(struct strmap *map, int free_values);

I guess the docstring was a bit inaccurate in the previous patch, then. :)

-Peff

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH 4/5] strmap: add strdup_strings option
  2020-08-21 18:52 ` [PATCH 4/5] strmap: add strdup_strings option Elijah Newren via GitGitGadget
@ 2020-08-21 20:01   ` Jeff King
  2020-08-21 20:41     ` Elijah Newren
  0 siblings, 1 reply; 144+ messages in thread
From: Jeff King @ 2020-08-21 20:01 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget; +Cc: git, Elijah Newren

On Fri, Aug 21, 2020 at 06:52:28PM +0000, Elijah Newren via GitGitGadget wrote:

> From: Elijah Newren <newren@gmail.com>
> 
> Just as it is sometimes useful for string_list to duplicate and take
> ownership of memory management of the strings it contains, the same is
> sometimes true for strmaps as well.  Add the same flag from string_list
> to strmap.

This is actually one of the ugliest parts of string_list, IMHO, and I'd
prefer if we can avoid duplicating it. Yes, sometimes we can manage to
avoid an extra copy of a string. But the resulting ownership and
lifetime questions are often very error-prone. In other data structures
we've moved towards just having the structure own its data (e.g.,
strvec does so, and things like oidmap store their own oids). I've been
happy with the simplicity of it.

It also works if you use a flex-array for the key storage in the
strmap_entry. :)

-Peff

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH 5/5] strmap: add functions facilitating use as a string->int map
  2020-08-21 18:52 ` [PATCH 5/5] strmap: add functions facilitating use as a string->int map Elijah Newren via GitGitGadget
@ 2020-08-21 20:10   ` Jeff King
  2020-08-21 20:51     ` Elijah Newren
  0 siblings, 1 reply; 144+ messages in thread
From: Jeff King @ 2020-08-21 20:10 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget; +Cc: git, Elijah Newren

On Fri, Aug 21, 2020 at 06:52:29PM +0000, Elijah Newren via GitGitGadget wrote:

> From: Elijah Newren <newren@gmail.com>
> 
> Although strmap could be used as a string->int map, one either had to
> allocate an int for every entry and then deallocate later, or one had to
> do a bunch of casting between (void*) and (intptr_t).
> 
> Add some special functions that do the casting.  Also, rename put->set
> for such wrapper functions since 'put' implied there may be some
> deallocation needed if the string was already found in the map, which
> isn't the case when we're storing an int value directly in the void*
> slot instead of using the void* slot as a pointer to data.

I think wrapping this kind of hackery is worth doing.

You'd be able to use put() as usual, wouldn't you? It never deallocates
the util field, but just returns the old one. And the caller knows that
it's really an int, and shouldn't be deallocated.

> A note on the name: strintmap looks and sounds pretty lame to me, but
> after trying to come up with something better and having no luck, I
> figured I'd just go with it for a while and then at some point some
> better and obvious name would strike me and I could replace it.  Several
> months later, I still don't have a better name.  Hopefully someone else
> has one.

strnummap? That's pretty bad, too.

Since the functions all take a raw strmap, you _could_ just do
"strmap_getint()", etc. But I think you could actually get some
additional safety by defining a wrapper type:

  struct strintmap {
          struct strmap strmap;
  };

It's a bit annoying because you a bunch of pass-through boilerplate for
stuff like:

  static inline int strintmap_empty(struct strintmap *map)
  {
          return strmap_empty(&map->map);
  }

but it would prevent mistakes like:

  strintmap_incr(&map, "foo", 10);
  strmap_clear(&map, 1);

which would try to free (void *)10.  I'm not sure if that's worth it or
not. You'd almost have to be trying to fail to pass "1" for free_util
there. But I've seen stranger things. :)

-Peff

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH 0/5] Add struct strmap and associated utility functions
  2020-08-21 18:52 [PATCH 0/5] Add struct strmap and associated utility functions Elijah Newren via GitGitGadget
                   ` (4 preceding siblings ...)
  2020-08-21 18:52 ` [PATCH 5/5] strmap: add functions facilitating use as a string->int map Elijah Newren via GitGitGadget
@ 2020-08-21 20:16 ` Jeff King
  2020-08-21 21:33   ` Elijah Newren
  2020-10-13  0:40 ` [PATCH v2 00/10] " Elijah Newren via GitGitGadget
  6 siblings, 1 reply; 144+ messages in thread
From: Jeff King @ 2020-08-21 20:16 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget; +Cc: git, Elijah Newren

On Fri, Aug 21, 2020 at 06:52:24PM +0000, Elijah Newren via GitGitGadget wrote:

> Here I introduce a new strmap type, which my new merge backed, merge-ort,
> uses heavily. (I also made significant use of it in my changes to
> diffcore-rename). This strmap type was based on Peff's proposal from a
> couple years ago[1], but has additions that I made as I used it. I also
> start the series off with a quick documentation improvement to hashmap.c to
> differentiate between hashmap_free() and hashmap_free_entries(), since I
> personally had difficulty understanding them and it affects how
> strmap_clear()/strmap_free() are written.

I like the direction overall (unsurprisingly), but left a bunch of
comments. I do think if we're going to do this that it may be worth
cleaning up hashmap a bit first, especially around its clear/free
semantics, and its ability to lazy-allocate the table.

I'm happy to work on that, but don't want to step on your toes.

I also wonder if you looked at the khash stuff at all. Especially for
storing integers, it makes things much more natural. You'd do something
like:

  /* you might even be able to just write !strcmp in the macro below */
  static inline int streq(const char *a, const char *b)
  {
          return !strcmp(a, b);
  }

  KHASH_INIT(strint_map, char *, int, 1, strhash, streq);

and then you'd probably want a "put" wrapper that makes a copy of the
string. khash has its own charming awkwardness, but I'm just curious if you
looked at it and found it more awkward than hashmap.c, or if you just
didn't look at it.

-Peff

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH 4/5] strmap: add strdup_strings option
  2020-08-21 20:01   ` Jeff King
@ 2020-08-21 20:41     ` Elijah Newren
  2020-08-21 21:03       ` Jeff King
  0 siblings, 1 reply; 144+ messages in thread
From: Elijah Newren @ 2020-08-21 20:41 UTC (permalink / raw)
  To: Jeff King; +Cc: Elijah Newren via GitGitGadget, Git Mailing List

On Fri, Aug 21, 2020 at 1:01 PM Jeff King <peff@peff.net> wrote:
>
> On Fri, Aug 21, 2020 at 06:52:28PM +0000, Elijah Newren via GitGitGadget wrote:
>
> > From: Elijah Newren <newren@gmail.com>
> >
> > Just as it is sometimes useful for string_list to duplicate and take
> > ownership of memory management of the strings it contains, the same is
> > sometimes true for strmaps as well.  Add the same flag from string_list
> > to strmap.
>
> This is actually one of the ugliest parts of string_list, IMHO, and I'd
> prefer if we can avoid duplicating it. Yes, sometimes we can manage to
> avoid an extra copy of a string. But the resulting ownership and
> lifetime questions are often very error-prone. In other data structures
> we've moved towards just having the structure own its data (e.g.,
> strvec does so, and things like oidmap store their own oids). I've been
> happy with the simplicity of it.
>
> It also works if you use a flex-array for the key storage in the
> strmap_entry. :)

I can see how it's easier, but that worries me about the number of
extra copies for my usecase.  In order to minimize actual computation,
I track an awful lot of auxiliary data in merge-ort so that I know
when I can safely perform many different case-specific optimizations.
Among other things, this means 15 strmaps.  1 of those stores a
mapping from all paths that traverse_trees() walks over (file or
directory) to metadata about the content on the three different sides.
9 of the remaining 14 simply share the strings in the main strmap,
because I don't need extra copies of the paths in the repository.  I
could (and maybe should) extend that to 11 of the 14.  Only 3 actually
do need to store a copy of the paths (because they store data used
beyond the end of an inner recursive merge or can be used to
accelerate subsequent commits in a rebase or cherry-pick sequence).

So, in most my cases, I don't want to duplicate strings.  I actually
started my implementation using FLEX_ALLOC_STR(), as you suggested
earlier in this thread, but tossed it because of this same desire to
not duplicate strings but just share them between the strmaps.

Granted, I made that decision before I had a complete implementation,
so I didn't measure the actual costs.  It's possible that was a
premature optimization.

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH 5/5] strmap: add functions facilitating use as a string->int map
  2020-08-21 20:10   ` Jeff King
@ 2020-08-21 20:51     ` Elijah Newren
  2020-08-21 21:05       ` Jeff King
  0 siblings, 1 reply; 144+ messages in thread
From: Elijah Newren @ 2020-08-21 20:51 UTC (permalink / raw)
  To: Jeff King; +Cc: Elijah Newren via GitGitGadget, Git Mailing List

On Fri, Aug 21, 2020 at 1:10 PM Jeff King <peff@peff.net> wrote:
>
> On Fri, Aug 21, 2020 at 06:52:29PM +0000, Elijah Newren via GitGitGadget wrote:
>
> > From: Elijah Newren <newren@gmail.com>
> >
> > Although strmap could be used as a string->int map, one either had to
> > allocate an int for every entry and then deallocate later, or one had to
> > do a bunch of casting between (void*) and (intptr_t).
> >
> > Add some special functions that do the casting.  Also, rename put->set
> > for such wrapper functions since 'put' implied there may be some
> > deallocation needed if the string was already found in the map, which
> > isn't the case when we're storing an int value directly in the void*
> > slot instead of using the void* slot as a pointer to data.
>
> I think wrapping this kind of hackery is worth doing.
>
> You'd be able to use put() as usual, wouldn't you? It never deallocates
> the util field, but just returns the old one. And the caller knows that
> it's really an int, and shouldn't be deallocated.

You can use put() as normal, if you don't mind the need to explicitly
throw in a typecast when you use it.  In fact, strintmap_set() does no
more than typecasting the int to void* and otherwise calling
strmap_put().

I initially called that strintmap_put(), but got confused once or
twice and looked up the function definition to make sure there wasn't
some deallocation I needed to handle.  After that, I decided to just
rename to _set() because I thought it'd reduce the chance of myself or
others wondering about that in the future.

>
> > A note on the name: strintmap looks and sounds pretty lame to me, but
> > after trying to come up with something better and having no luck, I
> > figured I'd just go with it for a while and then at some point some
> > better and obvious name would strike me and I could replace it.  Several
> > months later, I still don't have a better name.  Hopefully someone else
> > has one.
>
> strnummap? That's pretty bad, too.
>
> Since the functions all take a raw strmap, you _could_ just do
> "strmap_getint()", etc. But I think you could actually get some
> additional safety by defining a wrapper type:
>
>   struct strintmap {
>           struct strmap strmap;
>   };
>
> It's a bit annoying because you a bunch of pass-through boilerplate for
> stuff like:
>
>   static inline int strintmap_empty(struct strintmap *map)
>   {
>           return strmap_empty(&map->map);
>   }
>
> but it would prevent mistakes like:
>
>   strintmap_incr(&map, "foo", 10);
>   strmap_clear(&map, 1);
>
> which would try to free (void *)10.  I'm not sure if that's worth it or
> not. You'd almost have to be trying to fail to pass "1" for free_util
> there. But I've seen stranger things. :)

I like this idea and the extra safety it provides.  Most of strintmap
is static inline functions anyway, adding a few more wouldn't hurt.

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH 4/5] strmap: add strdup_strings option
  2020-08-21 20:41     ` Elijah Newren
@ 2020-08-21 21:03       ` Jeff King
  2020-08-21 22:25         ` Elijah Newren
  0 siblings, 1 reply; 144+ messages in thread
From: Jeff King @ 2020-08-21 21:03 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Elijah Newren via GitGitGadget, Git Mailing List

On Fri, Aug 21, 2020 at 01:41:44PM -0700, Elijah Newren wrote:

> > This is actually one of the ugliest parts of string_list, IMHO, and I'd
> > prefer if we can avoid duplicating it. Yes, sometimes we can manage to
> > avoid an extra copy of a string. But the resulting ownership and
> > lifetime questions are often very error-prone. In other data structures
> > we've moved towards just having the structure own its data (e.g.,
> > strvec does so, and things like oidmap store their own oids). I've been
> > happy with the simplicity of it.
> >
> > It also works if you use a flex-array for the key storage in the
> > strmap_entry. :)
> 
> I can see how it's easier, but that worries me about the number of
> extra copies for my usecase.  In order to minimize actual computation,
> I track an awful lot of auxiliary data in merge-ort so that I know
> when I can safely perform many different case-specific optimizations.
> Among other things, this means 15 strmaps.  1 of those stores a
> mapping from all paths that traverse_trees() walks over (file or
> directory) to metadata about the content on the three different sides.
> 9 of the remaining 14 simply share the strings in the main strmap,
> because I don't need extra copies of the paths in the repository.  I
> could (and maybe should) extend that to 11 of the 14.  Only 3 actually
> do need to store a copy of the paths (because they store data used
> beyond the end of an inner recursive merge or can be used to
> accelerate subsequent commits in a rebase or cherry-pick sequence).

I'd have to see the code, of course, but:

  - keep in mind you're allocating 8 bytes for a pointer (plus 24 for
    the rest of the strmap entry). If you use a flex-array you get those
    8 bytes back. Full paths do tend to be longer than that, so it's
    probably net worse than a pointer to an existing string. But how
    much worse, and does it matter?

  - That sounds like a lot of maps. :) I guess you've looked at
    compacting some of them into a single map-to-struct?

> So, in most my cases, I don't want to duplicate strings.  I actually
> started my implementation using FLEX_ALLOC_STR(), as you suggested
> earlier in this thread, but tossed it because of this same desire to
> not duplicate strings but just share them between the strmaps.
> 
> Granted, I made that decision before I had a complete implementation,
> so I didn't measure the actual costs.  It's possible that was a
> premature optimization.

I'm just really concerned that it poisons the data structure with
complexity that many of the other callers will have to deal with. We've
had several "oops, strdup_strings wasn't what I expected it to be" bugs
with string-list (in both directions: leaks and use-after-free). It
would be nice to have actual numbers and see if it's worth the cost.

-Peff

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH 5/5] strmap: add functions facilitating use as a string->int map
  2020-08-21 20:51     ` Elijah Newren
@ 2020-08-21 21:05       ` Jeff King
  0 siblings, 0 replies; 144+ messages in thread
From: Jeff King @ 2020-08-21 21:05 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Elijah Newren via GitGitGadget, Git Mailing List

On Fri, Aug 21, 2020 at 01:51:57PM -0700, Elijah Newren wrote:

> > I think wrapping this kind of hackery is worth doing.
> >
> > You'd be able to use put() as usual, wouldn't you? It never deallocates
> > the util field, but just returns the old one. And the caller knows that
> > it's really an int, and shouldn't be deallocated.
> 
> You can use put() as normal, if you don't mind the need to explicitly
> throw in a typecast when you use it.  In fact, strintmap_set() does no
> more than typecasting the int to void* and otherwise calling
> strmap_put().

Yeah, I think hiding the type-casting is worth it alone. I was just
confused by your remark.

> I initially called that strintmap_put(), but got confused once or
> twice and looked up the function definition to make sure there wasn't
> some deallocation I needed to handle.  After that, I decided to just
> rename to _set() because I thought it'd reduce the chance of myself or
> others wondering about that in the future.

Yeah, I'd agree that is a much better name. Since there's an "incr",
having a specific "set" makes it clear that we're overwriting.

> >   struct strintmap {
> >           struct strmap strmap;
> >   };
> [...]
> I like this idea and the extra safety it provides.  Most of strintmap
> is static inline functions anyway, adding a few more wouldn't hurt.

OK. Then I guess we can't cheat our way out of picking a name with
strmap_getint(). :)

-Peff

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH 0/5] Add struct strmap and associated utility functions
  2020-08-21 20:16 ` [PATCH 0/5] Add struct strmap and associated utility functions Jeff King
@ 2020-08-21 21:33   ` Elijah Newren
  2020-08-21 22:28     ` Elijah Newren
  2020-08-28  7:03     ` Jeff King
  0 siblings, 2 replies; 144+ messages in thread
From: Elijah Newren @ 2020-08-21 21:33 UTC (permalink / raw)
  To: Jeff King; +Cc: Elijah Newren via GitGitGadget, Git Mailing List

On Fri, Aug 21, 2020 at 1:16 PM Jeff King <peff@peff.net> wrote:
>
> On Fri, Aug 21, 2020 at 06:52:24PM +0000, Elijah Newren via GitGitGadget wrote:
>
> > Here I introduce a new strmap type, which my new merge backed, merge-ort,
> > uses heavily. (I also made significant use of it in my changes to
> > diffcore-rename). This strmap type was based on Peff's proposal from a
> > couple years ago[1], but has additions that I made as I used it. I also
> > start the series off with a quick documentation improvement to hashmap.c to
> > differentiate between hashmap_free() and hashmap_free_entries(), since I
> > personally had difficulty understanding them and it affects how
> > strmap_clear()/strmap_free() are written.
>
> I like the direction overall (unsurprisingly), but left a bunch of
> comments. I do think if we're going to do this that it may be worth
> cleaning up hashmap a bit first, especially around its clear/free
> semantics, and its ability to lazy-allocate the table.
>
> I'm happy to work on that, but don't want to step on your toes.

I have patches which introduce hashmap_clear() and
hashmap_clear_entries() to hashmap.[ch], which allowed me to simplify
strmap_clear(); instead of needing to call both
hashmap_free[_entries]() && strmap_init(), I could just call
hashmap_clear[_entries]().  Doing that surprised me with a significant
performance impact (in a good direction), at which point I started
adding mem-pool integration into hashmap for storing the entries that
hashmap.c allocates and got further good speedups.

I thought those were better explained when I got to the performance
stuff, so I had held off on those patches.  I could pull them out and
submit them first.

However, there's an important difference here between what I've done
and what you've suggested for hashmap: my method did not deallocate
hashmap->table in hashmap_clear() and then use lazy initialization.
In fact, I think not deallocating the table was part of the charm --
the table had already naturally grown to the right size, and because
the repository has approximately the same number of paths in various
commits, this provided me a way of getting a table preallocated to a
reasonable size for all merges after the first (and there are multiple
merges either when recursiveness is needed due to multiple merge
bases, OR when rebasing or cherry-picking a sequence of commits).
This prevented, as hashmap.h puts it, "expensive resizing".

So, once again, my performance ideas might be clashing with some of
your desires for the API.  Any clever ideas for resolving that?

Also, since you want to see hashmap cleanup first, should I submit
just the hashmap_clear[_entries()] stuff, or should I also submit the
API additions to allow mem-pool integration in hashmap (it's pretty
small and self-contained, but it'll be a while before I submit the
patches that use it...)?

> I also wonder if you looked at the khash stuff at all. Especially for
> storing integers, it makes things much more natural. You'd do something
> like:
>
>   /* you might even be able to just write !strcmp in the macro below */
>   static inline int streq(const char *a, const char *b)
>   {
>           return !strcmp(a, b);
>   }
>
>   KHASH_INIT(strint_map, char *, int, 1, strhash, streq);
>
> and then you'd probably want a "put" wrapper that makes a copy of the
> string. khash has its own charming awkwardness, but I'm just curious if you
> looked at it and found it more awkward than hashmap.c, or if you just
> didn't look at it.

I did look at it, but only briefly.  I had a further investigation on
my TODO list for months, along with several other improvement ideas.
But it seemed like my TODO list was really long, and my new merge
backend hasn't benefited anyone yet.  At some point, I decided to punt
on it and other ideas and start cleaning up my code and submitting.  I
believe merge-ort is more accurate than merge-recursive (it fixes
several test_expect_failures) and is a lot faster as well for the
cases I'm looking at.  So, for now, I've pulled it off my radar.

But I'd be really happy if someone else wanted to jump in and try
switching out hashmap for khash in the strmap API and see if it helps
merge-ort performance.  :-)

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH 4/5] strmap: add strdup_strings option
  2020-08-21 21:03       ` Jeff King
@ 2020-08-21 22:25         ` Elijah Newren
  2020-08-28  7:08           ` Jeff King
  0 siblings, 1 reply; 144+ messages in thread
From: Elijah Newren @ 2020-08-21 22:25 UTC (permalink / raw)
  To: Jeff King; +Cc: Elijah Newren via GitGitGadget, Git Mailing List

On Fri, Aug 21, 2020 at 2:03 PM Jeff King <peff@peff.net> wrote:
>
> On Fri, Aug 21, 2020 at 01:41:44PM -0700, Elijah Newren wrote:
>
> > > This is actually one of the ugliest parts of string_list, IMHO, and I'd
> > > prefer if we can avoid duplicating it. Yes, sometimes we can manage to
> > > avoid an extra copy of a string. But the resulting ownership and
> > > lifetime questions are often very error-prone. In other data structures
> > > we've moved towards just having the structure own its data (e.g.,
> > > strvec does so, and things like oidmap store their own oids). I've been
> > > happy with the simplicity of it.
> > >
> > > It also works if you use a flex-array for the key storage in the
> > > strmap_entry. :)
> >
> > I can see how it's easier, but that worries me about the number of
> > extra copies for my usecase.  In order to minimize actual computation,
> > I track an awful lot of auxiliary data in merge-ort so that I know
> > when I can safely perform many different case-specific optimizations.
> > Among other things, this means 15 strmaps.  1 of those stores a
> > mapping from all paths that traverse_trees() walks over (file or
> > directory) to metadata about the content on the three different sides.
> > 9 of the remaining 14 simply share the strings in the main strmap,
> > because I don't need extra copies of the paths in the repository.  I
> > could (and maybe should) extend that to 11 of the 14.  Only 3 actually
> > do need to store a copy of the paths (because they store data used
> > beyond the end of an inner recursive merge or can be used to
> > accelerate subsequent commits in a rebase or cherry-pick sequence).
>
> I'd have to see the code, of course, but:

>   - keep in mind you're allocating 8 bytes for a pointer (plus 24 for
>     the rest of the strmap entry). If you use a flex-array you get those
>     8 bytes back. Full paths do tend to be longer than that, so it's
>     probably net worse than a pointer to an existing string. But how
>     much worse, and does it matter?

I'll investigate; it may take a while...

>   - That sounds like a lot of maps. :) I guess you've looked at
>     compacting some of them into a single map-to-struct?

Oh, map-to-struct is the primary use.  But compacting them won't work,
because the reason for the additional maps is that they have different
sets of keys (this set of paths meet a certain condition...).  Only
one map contains all the paths involved in the merge.

Also, several of those maps don't even store a value; and are really
just a set implemented via strmap (thus meaning the only bit of data I
need for some conditions is whether any given path meets it).  It
seems slightly ugly to have to call strmap_put(map, string, NULL) for
those.  I wonder if I should have another strset type much like your
suggesting for strintmap.  Hmm...

Also, one thing that inflates the number of strmaps I use is that
several of those conditions are specific to a certain side of the
merge, thus requiring two strmaps for each of those special
conditions.

> > So, in most my cases, I don't want to duplicate strings.  I actually
> > started my implementation using FLEX_ALLOC_STR(), as you suggested
> > earlier in this thread, but tossed it because of this same desire to
> > not duplicate strings but just share them between the strmaps.
> >
> > Granted, I made that decision before I had a complete implementation,
> > so I didn't measure the actual costs.  It's possible that was a
> > premature optimization.
>
> I'm just really concerned that it poisons the data structure with
> complexity that many of the other callers will have to deal with. We've
> had several "oops, strdup_strings wasn't what I expected it to be" bugs
> with string-list (in both directions: leaks and use-after-free). It
> would be nice to have actual numbers and see if it's worth the cost.

I'll go get some and find out what the impact is.

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH 0/5] Add struct strmap and associated utility functions
  2020-08-21 21:33   ` Elijah Newren
@ 2020-08-21 22:28     ` Elijah Newren
  2020-08-28  7:03     ` Jeff King
  1 sibling, 0 replies; 144+ messages in thread
From: Elijah Newren @ 2020-08-21 22:28 UTC (permalink / raw)
  To: Jeff King; +Cc: Elijah Newren via GitGitGadget, Git Mailing List

On Fri, Aug 21, 2020 at 2:33 PM Elijah Newren <newren@gmail.com> wrote:
>
> On Fri, Aug 21, 2020 at 1:16 PM Jeff King <peff@peff.net> wrote:
> >
> > On Fri, Aug 21, 2020 at 06:52:24PM +0000, Elijah Newren via GitGitGadget wrote:
> >
> > > Here I introduce a new strmap type, which my new merge backed, merge-ort,
> > > uses heavily. (I also made significant use of it in my changes to
> > > diffcore-rename). This strmap type was based on Peff's proposal from a
> > > couple years ago[1], but has additions that I made as I used it. I also
> > > start the series off with a quick documentation improvement to hashmap.c to
> > > differentiate between hashmap_free() and hashmap_free_entries(), since I
> > > personally had difficulty understanding them and it affects how
> > > strmap_clear()/strmap_free() are written.
> >
> > I like the direction overall (unsurprisingly), but left a bunch of
> > comments. I do think if we're going to do this that it may be worth
> > cleaning up hashmap a bit first, especially around its clear/free
> > semantics, and its ability to lazy-allocate the table.
> >
> > I'm happy to work on that, but don't want to step on your toes.
>
> I have patches which introduce hashmap_clear() and
> hashmap_clear_entries() to hashmap.[ch], which allowed me to simplify
> strmap_clear(); instead of needing to call both
> hashmap_free[_entries]() && strmap_init(), I could just call
> hashmap_clear[_entries]().  Doing that surprised me with a significant
> performance impact (in a good direction), at which point I started
> adding mem-pool integration into hashmap for storing the entries that
> hashmap.c allocates and got further good speedups.
>
> I thought those were better explained when I got to the performance
> stuff, so I had held off on those patches.  I could pull them out and
> submit them first.
>
> However, there's an important difference here between what I've done
> and what you've suggested for hashmap: my method did not deallocate
> hashmap->table in hashmap_clear() and then use lazy initialization.
> In fact, I think not deallocating the table was part of the charm --
> the table had already naturally grown to the right size, and because
> the repository has approximately the same number of paths in various
> commits, this provided me a way of getting a table preallocated to a
> reasonable size for all merges after the first (and there are multiple
> merges either when recursiveness is needed due to multiple merge
> bases, OR when rebasing or cherry-picking a sequence of commits).
> This prevented, as hashmap.h puts it, "expensive resizing".
>
> So, once again, my performance ideas might be clashing with some of
> your desires for the API.  Any clever ideas for resolving that?
>
> Also, since you want to see hashmap cleanup first, should I submit
> just the hashmap_clear[_entries()] stuff, or should I also submit the
> API additions to allow mem-pool integration in hashmap (it's pretty
> small and self-contained, but it'll be a while before I submit the
> patches that use it...)?

Nevermind, I mis-remembered.  The mempool integration was added
specifically to strmap, not to hashmap, because strmap_put() does the
allocation of the str_entry.  So I'll just pull out the
hashmap_clear[_entries]() stuff and send it up.

>
> > I also wonder if you looked at the khash stuff at all. Especially for
> > storing integers, it makes things much more natural. You'd do something
> > like:
> >
> >   /* you might even be able to just write !strcmp in the macro below */
> >   static inline int streq(const char *a, const char *b)
> >   {
> >           return !strcmp(a, b);
> >   }
> >
> >   KHASH_INIT(strint_map, char *, int, 1, strhash, streq);
> >
> > and then you'd probably want a "put" wrapper that makes a copy of the
> > string. khash has its own charming awkwardness, but I'm just curious if you
> > looked at it and found it more awkward than hashmap.c, or if you just
> > didn't look at it.
>
> I did look at it, but only briefly.  I had a further investigation on
> my TODO list for months, along with several other improvement ideas.
> But it seemed like my TODO list was really long, and my new merge
> backend hasn't benefited anyone yet.  At some point, I decided to punt
> on it and other ideas and start cleaning up my code and submitting.  I
> believe merge-ort is more accurate than merge-recursive (it fixes
> several test_expect_failures) and is a lot faster as well for the
> cases I'm looking at.  So, for now, I've pulled it off my radar.
>
> But I'd be really happy if someone else wanted to jump in and try
> switching out hashmap for khash in the strmap API and see if it helps
> merge-ort performance.  :-)

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH 0/5] Add struct strmap and associated utility functions
  2020-08-21 21:33   ` Elijah Newren
  2020-08-21 22:28     ` Elijah Newren
@ 2020-08-28  7:03     ` Jeff King
  2020-08-28 15:29       ` Elijah Newren
  1 sibling, 1 reply; 144+ messages in thread
From: Jeff King @ 2020-08-28  7:03 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Elijah Newren via GitGitGadget, Git Mailing List

On Fri, Aug 21, 2020 at 02:33:54PM -0700, Elijah Newren wrote:

> However, there's an important difference here between what I've done
> and what you've suggested for hashmap: my method did not deallocate
> hashmap->table in hashmap_clear() and then use lazy initialization.
> In fact, I think not deallocating the table was part of the charm --
> the table had already naturally grown to the right size, and because
> the repository has approximately the same number of paths in various
> commits, this provided me a way of getting a table preallocated to a
> reasonable size for all merges after the first (and there are multiple
> merges either when recursiveness is needed due to multiple merge
> bases, OR when rebasing or cherry-picking a sequence of commits).
> This prevented, as hashmap.h puts it, "expensive resizing".
> 
> So, once again, my performance ideas might be clashing with some of
> your desires for the API.  Any clever ideas for resolving that?

If the magic is in pre-sizing the hash, then it seems like the callers
ought to be feeding the size hint. That does make a little more work for
them, but I think there's real value in having consistent semantics for
"clear" across our data structures.

However, one cheat would be to free the memory but retain the size hint
after a clear. And then if we lazy-init, grow immediately to the hint
size. That's more expensive than a true reuse, because we do reallocate
the memory. But it avoids the repeated re-allocation during growth.

It may also be a sign that we should be growing the hash more
aggressively in the first place. Of course all of this is predicated
having some benchmarks. It would be useful to know which part actually
provided the speedup.

-Peff

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH 4/5] strmap: add strdup_strings option
  2020-08-21 22:25         ` Elijah Newren
@ 2020-08-28  7:08           ` Jeff King
  2020-08-28 17:20             ` Elijah Newren
  0 siblings, 1 reply; 144+ messages in thread
From: Jeff King @ 2020-08-28  7:08 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Elijah Newren via GitGitGadget, Git Mailing List

On Fri, Aug 21, 2020 at 03:25:44PM -0700, Elijah Newren wrote:

> >   - That sounds like a lot of maps. :) I guess you've looked at
> >     compacting some of them into a single map-to-struct?
> 
> Oh, map-to-struct is the primary use.  But compacting them won't work,
> because the reason for the additional maps is that they have different
> sets of keys (this set of paths meet a certain condition...).  Only
> one map contains all the paths involved in the merge.

OK, I guess I'm not surprised that you would not have missed such an
obvious optimization. :)

> Also, several of those maps don't even store a value; and are really
> just a set implemented via strmap (thus meaning the only bit of data I
> need for some conditions is whether any given path meets it).  It
> seems slightly ugly to have to call strmap_put(map, string, NULL) for
> those.  I wonder if I should have another strset type much like your
> suggesting for strintmap.  Hmm...

FWIW, khash does have a "set" mode where it avoids allocating the value
array at all.

What's the easiest way to benchmark merge-ort? I suspect I could swap
out hashmap for khash (messily) in an hour or less.

-Peff

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH 0/5] Add struct strmap and associated utility functions
  2020-08-28  7:03     ` Jeff King
@ 2020-08-28 15:29       ` Elijah Newren
  2020-09-01  9:27         ` Jeff King
  0 siblings, 1 reply; 144+ messages in thread
From: Elijah Newren @ 2020-08-28 15:29 UTC (permalink / raw)
  To: Jeff King; +Cc: Elijah Newren via GitGitGadget, Git Mailing List

On Fri, Aug 28, 2020 at 12:03 AM Jeff King <peff@peff.net> wrote:
>
> On Fri, Aug 21, 2020 at 02:33:54PM -0700, Elijah Newren wrote:
>
> > However, there's an important difference here between what I've done
> > and what you've suggested for hashmap: my method did not deallocate
> > hashmap->table in hashmap_clear() and then use lazy initialization.
> > In fact, I think not deallocating the table was part of the charm --
> > the table had already naturally grown to the right size, and because
> > the repository has approximately the same number of paths in various
> > commits, this provided me a way of getting a table preallocated to a
> > reasonable size for all merges after the first (and there are multiple
> > merges either when recursiveness is needed due to multiple merge
> > bases, OR when rebasing or cherry-picking a sequence of commits).
> > This prevented, as hashmap.h puts it, "expensive resizing".
> >
> > So, once again, my performance ideas might be clashing with some of
> > your desires for the API.  Any clever ideas for resolving that?
>
> If the magic is in pre-sizing the hash, then it seems like the callers
> ought to be feeding the size hint. That does make a little more work for
> them, but I think there's real value in having consistent semantics for
> "clear" across our data structures.

I thought about adding a size hint from the callers, but the thing is
I don't know how to get a good one short of running a merge and
querying how big things were sized in that merge.  (In some common
cases I can get an upper bound, but I can't get it in all cases and
that upper bound might be a couple orders of magnitude too big.)
Thus, it's really a case where I just punt on pre-sizing for the first
merge, and use the size from the previous merge for subsequent ones.
If you have a non-recursive merge or are cherry-picking only a single
commit, then no sizing hint is used.

> However, one cheat would be to free the memory but retain the size hint
> after a clear. And then if we lazy-init, grow immediately to the hint
> size. That's more expensive than a true reuse, because we do reallocate
> the memory. But it avoids the repeated re-allocation during growth.
>
> It may also be a sign that we should be growing the hash more
> aggressively in the first place. Of course all of this is predicated
> having some benchmarks. It would be useful to know which part actually
> provided the speedup.

Your thoughts here are great; I also had another one this past week --
I could introduce a hashmap_partial_clear() (in addition to
hashmap_clear()) for the special usecase I have of leaving the table
allocated and pre-sized.  It'd prevent people from accidentally using
it and forgetting to free stuff, while still allowing me to take
advantage.  But, as you say, more benchmarks would be useful to find
which parts provided the speedup before taking any of these steps.

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH 4/5] strmap: add strdup_strings option
  2020-08-28  7:08           ` Jeff King
@ 2020-08-28 17:20             ` Elijah Newren
  0 siblings, 0 replies; 144+ messages in thread
From: Elijah Newren @ 2020-08-28 17:20 UTC (permalink / raw)
  To: Jeff King; +Cc: Elijah Newren via GitGitGadget, Git Mailing List

On Fri, Aug 28, 2020 at 12:08 AM Jeff King <peff@peff.net> wrote:
>
> On Fri, Aug 21, 2020 at 03:25:44PM -0700, Elijah Newren wrote:
>
> > >   - That sounds like a lot of maps. :) I guess you've looked at
> > >     compacting some of them into a single map-to-struct?
> >
> > Oh, map-to-struct is the primary use.  But compacting them won't work,
> > because the reason for the additional maps is that they have different
> > sets of keys (this set of paths meet a certain condition...).  Only
> > one map contains all the paths involved in the merge.
>
> OK, I guess I'm not surprised that you would not have missed such an
> obvious optimization. :)
>
> > Also, several of those maps don't even store a value; and are really
> > just a set implemented via strmap (thus meaning the only bit of data I
> > need for some conditions is whether any given path meets it).  It
> > seems slightly ugly to have to call strmap_put(map, string, NULL) for
> > those.  I wonder if I should have another strset type much like your
> > suggesting for strintmap.  Hmm...
>
> FWIW, khash does have a "set" mode where it avoids allocating the value
> array at all.

Cool.

> What's the easiest way to benchmark merge-ort?

Note that I discovered another optimization that I'm working on
implementing; when finished, it should cut down a little more on the
time spent on inexact rename detection.  That should have the side
effect of having the time spent on strmaps stick out some more in the
overall timings (as a percentage of overall time anyway).  So, I'm
focused on that before I do other benchmarking work (which is part of
the reason I mentioned my strmap/hashmap benchmarking last week might
take a while).

Anyway, on to your question:

=== If you just want to be able to run the ort merge algorithm ===

Clone git@github.com:newren/git and checkout the 'ort' branch and
build it.  It currently changes the default merge algorithm to 'ort'
and even ignores '-s recursive' by remapping it to '-s ort' (because I
wanted to see how regression tests fared with ort as a replacement for
recrusive).  It should pass the regression tests if you want to run
those first.  But note that if you want to compare 'ort' to
'recursive', then currently you need to have two different git builds,
one of my branch and one with a different checkout of something else
(e.g. 2.28.0 or 'master' or whatever).

=== Decide the granularity of your timing ===

I suspect you know more than me here, but maybe my pointers are useful anyway...

Decide if you want to measure overall program runtime, or dive into
details.  I used both a simple 'time' and the better 'hyperfine' for
the former, and used both 'perf' and GIT_TRACE2_PERF for the latter.
One nice thing about GIT_TRACE2_PERF was I wrote a simple program to
aggregate the times per region and provide percentages, in a script at
the toplevel named 'summarize-perf' that I can use to prefix commands.
Thus, I could for example run from my linux clone:
    $ ../git/summarize-perf git fast-rebase --onto HEAD base hwmon-updates
and I'd get output that looks something like (note that this is a
subset of the real output):
    1.400 : 35 : label:inmemory_nonrecursive
       0.827 : 41 : ..label:renames
          0.019 : <unmeasured> ( 2.2%)
          0.803 : 37 : ....label:regular renames
          0.004 : 31 : ....label:directory renames
          0.001 : 31 : ....label:process renames
       0.513 : 41 : ..label:collect_merge_info
       0.048 : 35 : ..label:process_entries
    0.117 : 1 : label:checkout
    0.000 : 1 : label:record_unmerged
and where those fields are <time> : <count> : <region label>.

=== If you want to time the testcases I used heavily while developing ===

The rebase-testcase/redo-timings script (in the ort branch) has
details on what I actually ran, though it has some paranoia around
attempting to make my laptop run semi-quietly and try to avoid all the
variance that I wished I could control a bit better.  And it assumes
you are running in a linux clone with a few branches set up a certain
way.  Let me explain those tests without using that script, as simply
as I can:

The setup for the particular cases I was testing is as follows:
  * Clone the linux kernel, and run the following:
  $ git branch hwmon-updates fd8bdb23b91876ac1e624337bb88dc1dcc21d67e
  $ git branch hwmon-just-one fd8bdb23b91876ac1e624337bb88dc1dcc21d67e~34
  $ git branch base 4703d9119972bf586d2cca76ec6438f819ffa30e
  $ git switch -c 5.4-renames v5.4
  $ git mv drivers pilots
  $ git commit -m "Rename drivers/ to pilots/"

And from there, there were three primary tests I was comparing:

  * Rename testcase, 35 patches:
  $ git checkout 5.4-renames^0
  $ git fast-rebase --onto HEAD base hwmon-updates

  * Rename testcase, just 1 patch:
  $ git switch 5.4-renames^0
  $ git fast-rebase --onto HEAD base hwmon-just-one

  * No renames (or at least very few renames) testcase, 35 patches:
  $ git checkout v5.4^0
  $ git branch -f hwmon-updates
fd8bdb23b91876ac1e624337bb88dc1dcc21d67e # Need to reset
hwmon-updates, due to fast-rebase done above
  $ git fast-rebase --onto HEAD base hwmon-updates

(If you want to compare with 'recursive' from a different build of
git, just replace 'fast-rebase' with 'rebase'.  You can also use
'rebase' instead of 'fast-rebase' on the ort branch and it'll use the
ort merge algorithm, but you get all the annoying
working-tree-updates-while-rebasing rather than just having the
working tree updated at the end of the rebase.  You also get all the
annoying forks of 'git checkout' and 'git commit' that sequencer is
guilty of spawning.  But it certainly supports a lot more options and
can save state to allow resuming after conflicts, unlike
'fast-rebase'.)

> I suspect I could swap out hashmap for khash (messily) in an hour or less.

Well, you might be assuming I used sane strmaps, with each strmap
having a fixed type for the stored value.  That's mostly true, but
there were two counterexamples I can think of: "paths" (the biggest
strmap) is a map of string -> {merged_info OR conflict_info}, because
merged_info is a smaller subset of conflict_info and saves space for
each path that can be trivially merged.  Also, in diffcore-rename,
"dir_rename" starts life as a map of string -> strmap, but later
transitions to string -> string, because I'm evil and didn't set up a
temporary strmap like I probably should have.

Also, the code is littered with FIXME comments, unnecessary #ifdefs,
and is generally in need of lots of cleanup.  Sorry.

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH 0/5] Add struct strmap and associated utility functions
  2020-08-28 15:29       ` Elijah Newren
@ 2020-09-01  9:27         ` Jeff King
  0 siblings, 0 replies; 144+ messages in thread
From: Jeff King @ 2020-09-01  9:27 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Elijah Newren via GitGitGadget, Git Mailing List

On Fri, Aug 28, 2020 at 08:29:44AM -0700, Elijah Newren wrote:

> > It may also be a sign that we should be growing the hash more
> > aggressively in the first place. Of course all of this is predicated
> > having some benchmarks. It would be useful to know which part actually
> > provided the speedup.
> 
> Your thoughts here are great; I also had another one this past week --
> I could introduce a hashmap_partial_clear() (in addition to
> hashmap_clear()) for the special usecase I have of leaving the table
> allocated and pre-sized.  It'd prevent people from accidentally using
> it and forgetting to free stuff, while still allowing me to take
> advantage.  But, as you say, more benchmarks would be useful to find
> which parts provided the speedup before taking any of these steps.

Yeah, having a separate function to explicitly do "remove all elements
but keep the table allocated" would be fine with me. My big desire is
that clear() should do the safe, non-leaking thing by default.

-Peff

^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v2 00/10] Add struct strmap and associated utility functions
  2020-08-21 18:52 [PATCH 0/5] Add struct strmap and associated utility functions Elijah Newren via GitGitGadget
                   ` (5 preceding siblings ...)
  2020-08-21 20:16 ` [PATCH 0/5] Add struct strmap and associated utility functions Jeff King
@ 2020-10-13  0:40 ` Elijah Newren via GitGitGadget
  2020-10-13  0:40   ` [PATCH v2 01/10] hashmap: add usage documentation explaining hashmap_free[_entries]() Elijah Newren via GitGitGadget
                     ` (10 more replies)
  6 siblings, 11 replies; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-10-13  0:40 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren

Here I introduce a new strmap type, which my new merge backed, merge-ort,
uses heavily. (I also made significant use of it in my changes to
diffcore-rename). This strmap type was based on Peff's proposal from a
couple years ago[1], but has additions that I made as I used it. I also
start the series off with some changes to hashmap, based on Peff's feedback
on v1.

Changes since v1:

 * Rebased on newer origin/master (to resolve a conflict in Makefile)
 * First fixed hashmap to allow it to continue to be used after
   hashmap_free() is called, as requested by Peff.
 * Renamed my hashmap_clear() to hashmap_partial_clear() to avoid mis-use
   and make it clear it doesn't free everything.
 * Distanced the API from string-list, as per feedback from Peff. In
   particular, removed the use of string_list_item, and made strdup_strings
   the default -- strmap_init() does not accept a strdup_strings parameter
   it just defaults to on; one has to call strmap_ocd_init() instead if one
   wants to control the memory closely.
 * Added strintmap and strset types, for string->int mappings, and cases
   where we just want a set of keys rather than having each key map to some
   value.
 * Also included a patch enabling strmaps to make use of mem_pools (also
   only accessible via the strmap_ocd_init() constructor ); I previously
   thought this only made sense to include after the relevant point in
   merge-ort, but I figured since it slightly affects the API (it was part
   of what led me to the strmap_ocd_init() name), I decided to include it
   for now.
 * The hashmap_partial_clear() and strmap_partial_clear() additions in
   patches 4 and 7 could also potentially be deferred much like mem_pool
   additions, but again, I decided to include them because they give a
   better picture of the overall usage I have and usecases I'm trying to
   design the API for.

Things Peff mentioned that are NOT included in this v2:

 * Peff brought up the idea of having a free_values member instead of having
   a free_util parameter to strmap_clear(). I think that would just mean
   moving the parameter from strmap_clear() to strmap_init().
 * Peff wanted the strmap_entry type to have a char key[FLEX_ALLOC] instead
   of having a (const) char *key.

Explanations/excuses for the above: 

 * For the free_values member, it sounded like Peff didn't have a strong
   opinion. I don't either, so I'm happy to switch it if someone feels
   strongly. But since it sounded like a thinking-out-loud comment and I
   couldn't see an advantage one way or the other, I left things as-is.
 * For the FLEX_ALLOC implementation, Peff did have a clear strong
   preference. I put a day or so of time into trying to get an alternate
   implementation working (and I at least ripped out the string_list_item
   sub-type, made the key be const, and fixed the ugly casts), but didn't
   quite get the code to a working state. (Not only does it change the
   memory management quite a bit in ways that I need to run under valgrind
   both with and without mem_pools, but there's an extra wrinkle as well:
   merge-ort differs from merge-recursive in that all filepaths come from a
   single tree traversal instead of several different ones, which means that
   I had the ability to compare path strings for equality via pointer
   comparisons instead of using strcmp. Copying strings for each of the
   different strmaps breaks that.) If the changes I've made aren't
   sufficient and folks still want to see the performance of a FLEX_ALLOC
   implementation, it's probably possible to get it working, just
   surprisingly non-trivial. For now, I at least wanted to get feedback on
   my other changes, and probe for whether folks do want me to put another
   day or two into this.

[1] 
https://lore.kernel.org/git/20180906191203.GA26184@sigill.intra.peff.net/

Elijah Newren (10):
  hashmap: add usage documentation explaining hashmap_free[_entries]()
  hashmap: adjust spacing to fix argument alignment
  hashmap: allow re-use after hashmap_free()
  hashmap: introduce a new hashmap_partial_clear()
  strmap: new utility functions
  strmap: add more utility functions
  strmap: enable faster clearing and reusing of strmaps
  strmap: add functions facilitating use as a string->int map
  strmap: add a strset sub-type
  strmap: enable allocations to come from a mem_pool

 Makefile  |   1 +
 hashmap.c |  72 +++++++++++-----
 hashmap.h |  44 +++++++++-
 strmap.c  | 152 +++++++++++++++++++++++++++++++++
 strmap.h  | 250 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 494 insertions(+), 25 deletions(-)
 create mode 100644 strmap.c
 create mode 100644 strmap.h


base-commit: d4a392452e292ff924e79ec8458611c0f679d6d4
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-835%2Fnewren%2Fstrmap-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-835/newren/strmap-v2
Pull-Request: https://github.com/git/git/pull/835

Range-diff vs v1:

  1:  b295e9393a !  1:  af6b6fcb46 hashmap: add usage documentation explaining hashmap_free[_entries]()
     @@ Commit message
          hashmap: add usage documentation explaining hashmap_free[_entries]()
      
          The existence of hashmap_free() and hashmap_free_entries() confused me,
     -    and the docs weren't clear enough.  I had to consult other source code
     -    examples and the implementation.  Add a brief note to clarify,
     -    especially since hashmap_clear*() variants may be added in the future.
     +    and the docs weren't clear enough.  We are dealing with a map table,
     +    entries in that table, and possibly also things each of those entries
     +    point to.  I had to consult other source code examples and the
     +    implementation.  Add a brief note to clarify the differences.  This will
     +    become even more important once we introduce a new
     +    hashmap_partial_clear() function which will add the question of whether
     +    the table itself has been freed.
      
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
     @@ hashmap.h: void hashmap_init(struct hashmap *map,
      + * Many callers will need to iterate over all entries and free the data each
      + * entry points to; in such a case, they can free the entry itself while at it.
      + * Thus, you might see:
     ++ *
      + *    hashmap_for_each_entry(map, hashmap_iter, e, hashmap_entry_name) {
      + *      free(e->somefield);
      + *      free(e);
      + *    }
      + *    hashmap_free(map);
     ++ *
      + * instead of
     ++ *
      + *    hashmap_for_each_entry(map, hashmap_iter, e, hashmap_entry_name) {
      + *      free(e->somefield);
      + *    }
      + *    hashmap_free_entries(map, struct my_entry_struct, hashmap_entry_name);
     ++ *
      + * to avoid the implicit extra loop over the entries.  However, if there are
      + * no special fields in your entry that need to be freed beyond the entry
      + * itself, it is probably simpler to avoid the explicit loop and just call
  -:  ---------- >  2:  75f17619e9 hashmap: adjust spacing to fix argument alignment
  -:  ---------- >  3:  a686d0758a hashmap: allow re-use after hashmap_free()
  -:  ---------- >  4:  061ab45a9b hashmap: introduce a new hashmap_partial_clear()
  2:  a86fd5fdcc !  5:  5c7507f55b strmap: new utility functions
     @@ Commit message
          taken directly from Peff's proposal at
          https://lore.kernel.org/git/20180906191203.GA26184@sigill.intra.peff.net/
      
     -    Peff only included the header, not the implementation, so it isn't clear what
     -    the structure was he was going to use for the hash entries.  Instead of having
     -    my str_entry struct have three subfields (the hashmap_entry, the string, and
     -    the void* value), I made it only have two -- the hashmap_entry and a
     -    string_list_item, for two reasons:
     +    A couple of items of note:
      
     -      1) a strmap is often the data structure we want where string_list has
     -         been used in the past.  Using the same building block for
     -         individual entries in both makes it easier to adopt and reuse
     -         parts of the string_list API in strmap.
     +      * Similar to string-list, I have a strdup_strings setting.  However,
     +        unlike string-list, strmap_init() does not take a parameter for this
     +        setting and instead automatically sets it to 1; callers who want to
     +        control this detail need to instead call strmap_ocd_init().
      
     -      2) In some cases, after doing lots of other work, I want to be able
     -         to iterate over the items in my strmap in sorted order.  hashmap
     -         obviously doesn't support that, but I wanted to be able to export
     -         the strmap to a string_list easily and then use its functions.
     -         (Note: I do not need the data structure to both be sorted and have
     -         efficient lookup at all times.  If I did, I might use a B-tree
     -         instead, as suggested by brian in response to Peff in the thread
     -         noted above.  In my case, most strmaps will never need sorting, but
     -         in one special case at the very end of a bunch of other work I want
     -         to iterate over the items in sorted order without doing any more
     -         lookups afterward.)
     -
     -    Also, I removed the STRMAP_INIT macro, since it cannot be used to
     -    correctly initialize a strmap; the underlying hashmap needs a call to
     -    hashmap_init() to allocate the hash table first.
     +      * I do not have a STRMAP_INIT macro.  I could possibly add one, but
     +          #define STRMAP_INIT { { NULL, cmp_str_entry, NULL, 0, 0, 0, 0, 0 }, 1 }
     +        feels a bit unwieldy and possibly error-prone in terms of future
     +        expansion of the hashmap struct.  The fact that cmp_str_entry needs to
     +        be in there prevents us from passing all zeros for the hashmap, and makes
     +        me worry that STRMAP_INIT would just be more trouble than it is worth.
      
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
       ## Makefile ##
     -@@ Makefile: LIB_OBJS += strbuf.o
     - LIB_OBJS += strvec.o
     +@@ Makefile: LIB_OBJS += stable-qsort.o
     + LIB_OBJS += strbuf.o
       LIB_OBJS += streaming.o
       LIB_OBJS += string-list.o
      +LIB_OBJS += strmap.o
     + LIB_OBJS += strvec.o
       LIB_OBJS += sub-process.o
       LIB_OBJS += submodule-config.o
     - LIB_OBJS += submodule.o
      
       ## strmap.c (new) ##
      @@
      +#include "git-compat-util.h"
      +#include "strmap.h"
      +
     -+static int cmp_str_entry(const void *hashmap_cmp_fn_data,
     -+			 const struct hashmap_entry *entry1,
     -+			 const struct hashmap_entry *entry2,
     -+			 const void *keydata)
     ++static int cmp_strmap_entry(const void *hashmap_cmp_fn_data,
     ++			    const struct hashmap_entry *entry1,
     ++			    const struct hashmap_entry *entry2,
     ++			    const void *keydata)
      +{
     -+	const struct str_entry *e1, *e2;
     ++	const struct strmap_entry *e1, *e2;
      +
     -+	e1 = container_of(entry1, const struct str_entry, ent);
     -+	e2 = container_of(entry2, const struct str_entry, ent);
     -+	return strcmp(e1->item.string, e2->item.string);
     ++	e1 = container_of(entry1, const struct strmap_entry, ent);
     ++	e2 = container_of(entry2, const struct strmap_entry, ent);
     ++	return strcmp(e1->key, e2->key);
      +}
      +
     -+static struct str_entry *find_str_entry(struct strmap *map,
     -+					const char *str)
     ++static struct strmap_entry *find_strmap_entry(struct strmap *map,
     ++					      const char *str)
      +{
     -+	struct str_entry entry;
     ++	struct strmap_entry entry;
      +	hashmap_entry_init(&entry.ent, strhash(str));
     -+	entry.item.string = (char *)str;
     ++	entry.key = str;
      +	return hashmap_get_entry(&map->map, &entry, ent, NULL);
      +}
      +
      +void strmap_init(struct strmap *map)
      +{
     -+	hashmap_init(&map->map, cmp_str_entry, NULL, 0);
     ++	strmap_ocd_init(map, 1);
      +}
      +
     -+void strmap_clear(struct strmap *map, int free_util)
     ++void strmap_ocd_init(struct strmap *map,
     ++		     int strdup_strings)
     ++{
     ++	hashmap_init(&map->map, cmp_strmap_entry, NULL, 0);
     ++	map->strdup_strings = strdup_strings;
     ++}
     ++
     ++static void strmap_free_entries_(struct strmap *map, int free_util)
      +{
      +	struct hashmap_iter iter;
     -+	struct str_entry *e;
     ++	struct strmap_entry *e;
      +
      +	if (!map)
      +		return;
      +
     -+	hashmap_for_each_entry(&map->map, &iter, e, ent /* member name */) {
     -+		free(e->item.string);
     ++	/*
     ++	 * We need to iterate over the hashmap entries and free
     ++	 * e->key and e->value ourselves; hashmap has no API to
     ++	 * take care of that for us.  Since we're already iterating over
     ++	 * the hashmap, though, might as well free e too and avoid the need
     ++	 * to make some call into the hashmap API to do that.
     ++	 */
     ++	hashmap_for_each_entry(&map->map, &iter, e, ent) {
      +		if (free_util)
     -+			free(e->item.util);
     ++			free(e->value);
     ++		if (map->strdup_strings)
     ++			free((char*)e->key);
     ++		free(e);
      +	}
     -+	hashmap_free_entries(&map->map, struct str_entry, ent);
     -+	strmap_init(map);
      +}
      +
     -+/*
     -+ * Insert "str" into the map, pointing to "data". A copy of "str" is made, so
     -+ * it does not need to persist after the this function is called.
     -+ *
     -+ * If an entry for "str" already exists, its data pointer is overwritten, and
     -+ * the original data pointer returned. Otherwise, returns NULL.
     -+ */
     ++void strmap_clear(struct strmap *map, int free_util)
     ++{
     ++	strmap_free_entries_(map, free_util);
     ++	hashmap_free(&map->map);
     ++}
     ++
      +void *strmap_put(struct strmap *map, const char *str, void *data)
      +{
     -+	struct str_entry *entry = find_str_entry(map, str);
     ++	struct strmap_entry *entry = find_strmap_entry(map, str);
      +	void *old = NULL;
      +
      +	if (entry) {
     -+		old = entry->item.util;
     -+		entry->item.util = data;
     ++		old = entry->value;
     ++		entry->value = data;
      +	} else {
     ++		/*
     ++		 * We won't modify entry->key so it really should be const.
     ++		 */
     ++		const char *key = str;
     ++
      +		entry = xmalloc(sizeof(*entry));
      +		hashmap_entry_init(&entry->ent, strhash(str));
     -+		entry->item.string = strdup(str);
     -+		entry->item.util = data;
     ++
     ++		if (map->strdup_strings)
     ++			key = xstrdup(str);
     ++		entry->key = key;
     ++		entry->value = data;
      +		hashmap_add(&map->map, &entry->ent);
      +	}
      +	return old;
     @@ strmap.c (new)
      +
      +void *strmap_get(struct strmap *map, const char *str)
      +{
     -+	struct str_entry *entry = find_str_entry(map, str);
     -+	return entry ? entry->item.util : NULL;
     ++	struct strmap_entry *entry = find_strmap_entry(map, str);
     ++	return entry ? entry->value : NULL;
      +}
      +
      +int strmap_contains(struct strmap *map, const char *str)
      +{
     -+	return find_str_entry(map, str) != NULL;
     ++	return find_strmap_entry(map, str) != NULL;
      +}
      
       ## strmap.h (new) ##
     @@ strmap.h (new)
      +#define STRMAP_H
      +
      +#include "hashmap.h"
     -+#include "string-list.h"
      +
      +struct strmap {
      +	struct hashmap map;
     ++	unsigned int strdup_strings:1;
      +};
      +
     -+struct str_entry {
     ++struct strmap_entry {
      +	struct hashmap_entry ent;
     -+	struct string_list_item item;
     ++	const char *key;
     ++	void *value;
      +};
      +
      +/*
     -+ * Initialize an empty strmap
     ++ * Initialize the members of the strmap.  Any keys added to the strmap will
     ++ * be strdup'ed with their memory managed by the strmap.
      + */
      +void strmap_init(struct strmap *map);
      +
      +/*
     ++ * Same as strmap_init, but for those who want to control the memory management
     ++ * carefully instead of using the default of strdup_strings=1.
     ++ * (OCD = Obsessive Compulsive Disorder, a joke that those who use this function
     ++ * are obsessing over minor details.)
     ++ */
     ++void strmap_ocd_init(struct strmap *map,
     ++		     int strdup_strings);
     ++
     ++/*
      + * Remove all entries from the map, releasing any allocated resources.
      + */
      +void strmap_clear(struct strmap *map, int free_values);
      +
      +/*
     -+ * Insert "str" into the map, pointing to "data". A copy of "str" is made, so
     -+ * it does not need to persist after the this function is called.
     ++ * Insert "str" into the map, pointing to "data".
      + *
      + * If an entry for "str" already exists, its data pointer is overwritten, and
      + * the original data pointer returned. Otherwise, returns NULL.
  3:  5bda171d0c !  6:  61b5bf1110 strmap: add more utility functions
     @@ Commit message
            * strmap_get_size()
            * strmap_remove()
            * strmap_for_each_entry()
     -      * strmap_free()
     -      * strmap_get_item()
     +      * strmap_get_entry()
      
          I suspect the first four are self-explanatory.
      
     -    strmap_free() differs from strmap_clear() in that the data structure is
     -    not reusable after it is called; strmap_clear() is not sufficient for
     -    the API because without strmap_free() we will leak memory.
     -
     -    strmap_get_item() is similar to strmap_get() except that instead of just
     +    strmap_get_entry() is similar to strmap_get() except that instead of just
          returning the void* value that the string maps to, it returns the
     -    string_list_item that contains both the string and the void* value (or
     +    strmap_entry that contains both the string and the void* value (or
          NULL if the string isn't in the map).  This is helpful because it avoids
          multiple lookups, e.g. in some cases a caller would need to call:
            * strmap_contains() to check that the map has an entry for the string
     @@ Commit message
            * strmap_put() to update/overwrite the value
          If the void* pointer returned really is a pointer, then the last step is
          unnecessary, but if the void* pointer is just cast to an integer then
     -    strmap_put() will be needed.  In contrast, one can call strmap_get_item()
     +    strmap_put() will be needed.  In contrast, one can call strmap_get_entry()
          and then:
            * check if the string was in the map by whether the pointer is NULL
     -      * access the value via item->util
     -      * directly update item->util
     +      * access the value via entry->value
     +      * directly update entry->value
          meaning that we can replace two or three hash table lookups with one.
      
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
       ## strmap.c ##
     -@@ strmap.c: void strmap_init(struct strmap *map)
     - 	hashmap_init(&map->map, cmp_str_entry, NULL, 0);
     - }
     - 
     --void strmap_clear(struct strmap *map, int free_util)
     -+void strmap_free(struct strmap *map, int free_util)
     - {
     - 	struct hashmap_iter iter;
     - 	struct str_entry *e;
     -@@ strmap.c: void strmap_clear(struct strmap *map, int free_util)
     - 	if (!map)
     - 		return;
     - 
     --	hashmap_for_each_entry(&map->map, &iter, e, ent /* member name */) {
     --		free(e->item.string);
     --		if (free_util)
     --			free(e->item.util);
     -+	if (free_util) {
     -+		hashmap_for_each_entry(&map->map, &iter, e, ent) {
     -+			free(e->item.string);
     -+			if (free_util)
     -+				free(e->item.util);
     -+		}
     - 	}
     - 	hashmap_free_entries(&map->map, struct str_entry, ent);
     -+}
     -+
     -+void strmap_clear(struct strmap *map, int free_util)
     -+{
     -+	strmap_free(map, free_util);
     - 	strmap_init(map);
     - }
     - 
      @@ strmap.c: void *strmap_put(struct strmap *map, const char *str, void *data)
       	return old;
       }
       
     -+struct string_list_item *strmap_get_item(struct strmap *map,
     -+					 const char *str)
     ++struct strmap_entry *strmap_get_entry(struct strmap *map, const char *str)
      +{
     -+	struct str_entry *entry = find_str_entry(map, str);
     -+	return entry ? &entry->item : NULL;
     ++	return find_strmap_entry(map, str);
      +}
      +
       void *strmap_get(struct strmap *map, const char *str)
       {
     - 	struct str_entry *entry = find_str_entry(map, str);
     + 	struct strmap_entry *entry = find_strmap_entry(map, str);
      @@ strmap.c: int strmap_contains(struct strmap *map, const char *str)
       {
     - 	return find_str_entry(map, str) != NULL;
     + 	return find_strmap_entry(map, str) != NULL;
       }
      +
      +void strmap_remove(struct strmap *map, const char *str, int free_util)
      +{
     -+	struct str_entry entry, *ret;
     ++	struct strmap_entry entry, *ret;
      +	hashmap_entry_init(&entry.ent, strhash(str));
     -+	entry.item.string = (char *)str;
     ++	entry.key = str;
      +	ret = hashmap_remove_entry(&map->map, &entry, ent, NULL);
     -+	if (ret && free_util)
     -+		free(ret->item.util);
     ++	if (!ret)
     ++		return;
     ++	if (free_util)
     ++		free(ret->value);
     ++	if (map->strdup_strings)
     ++		free((char*)ret->key);
      +	free(ret);
      +}
      
       ## strmap.h ##
     -@@ strmap.h: void strmap_init(struct strmap *map);
     - /*
     -  * Remove all entries from the map, releasing any allocated resources.
     -  */
     -+void strmap_free(struct strmap *map, int free_values);
     -+
     -+/*
     -+ * Same as calling strmap_free() followed by strmap_init().
     -+ */
     - void strmap_clear(struct strmap *map, int free_values);
     - 
     - /*
      @@ strmap.h: void strmap_clear(struct strmap *map, int free_values);
        */
       void *strmap_put(struct strmap *map, const char *str, void *data);
       
      +/*
     -+ * Return the string_list_item mapped by "str", or NULL if there is not such
     ++ * Return the strmap_entry mapped by "str", or NULL if there is not such
      + * an item in map.
      + */
     -+struct string_list_item *strmap_get_item(struct strmap *map, const char *str);
     ++struct strmap_entry *strmap_get_entry(struct strmap *map, const char *str);
      +
       /*
        * Return the data pointer mapped by "str", or NULL if the entry does not
     @@ strmap.h: void *strmap_get(struct strmap *map, const char *str);
      +}
      +
      +/*
     -+ * iterate through @map using @iter, @var is a pointer to a type str_entry
     ++ * iterate through @map using @iter, @var is a pointer to a type strmap_entry
      + */
      +#define strmap_for_each_entry(mystrmap, iter, var)	\
      +	for (var = hashmap_iter_first_entry_offset(&(mystrmap)->map, iter, \
      +						   OFFSETOF_VAR(var, ent)); \
      +		var; \
      +		var = hashmap_iter_next_entry_offset(iter, \
     -+						OFFSETOF_VAR(var, ent)))
     ++						     OFFSETOF_VAR(var, ent)))
      +
       #endif /* STRMAP_H */
  -:  ---------- >  7:  2ebce0c5d8 strmap: enable faster clearing and reusing of strmaps
  5:  418975b460 !  8:  cc8d702f98 strmap: add functions facilitating use as a string->int map
     @@ Commit message
          isn't the case when we're storing an int value directly in the void*
          slot instead of using the void* slot as a pointer to data.
      
     -    A note on the name: strintmap looks and sounds pretty lame to me, but
     -    after trying to come up with something better and having no luck, I
     -    figured I'd just go with it for a while and then at some point some
     -    better and obvious name would strike me and I could replace it.  Several
     -    months later, I still don't have a better name.  Hopefully someone else
     -    has one.
     +    A note on the name: if anyone has a better name suggestion than
     +    strintmap, I'm happy to take it.  It seems slightly unwieldy, but I have
     +    not been able to come up with a better name.
      
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
       ## strmap.c ##
      @@ strmap.c: void strmap_remove(struct strmap *map, const char *str, int free_util)
     - 		free(ret->item.util);
     + 		free((char*)ret->key);
       	free(ret);
       }
      +
     -+void strintmap_incr(struct strmap *map, const char *str, intptr_t amt)
     ++void strintmap_incr(struct strintmap *map, const char *str, intptr_t amt)
      +{
     -+	struct str_entry *entry = find_str_entry(map, str);
     ++	struct strmap_entry *entry = find_strmap_entry(&map->map, str);
      +	if (entry) {
     -+		intptr_t *whence = (intptr_t*)&entry->item.util;
     ++		intptr_t *whence = (intptr_t*)&entry->value;
      +		*whence += amt;
      +	}
      +	else
     @@ strmap.c: void strmap_remove(struct strmap *map, const char *str, int free_util)
       ## strmap.h ##
      @@ strmap.h: static inline unsigned int strmap_get_size(struct strmap *map)
       		var = hashmap_iter_next_entry_offset(iter, \
     - 						OFFSETOF_VAR(var, ent)))
     + 						     OFFSETOF_VAR(var, ent)))
       
     ++
      +/*
     -+ * Helper functions for using strmap as map of string -> int, using the void*
     -+ * field to store the int instead of allocating an int and having the void*
     -+ * member point to the allocated int.
     ++ * strintmap:
     ++ *    A map of string -> int, typecasting the void* of strmap to an int.
     ++ *
     ++ * Primary differences:
     ++ *    1) Since the void* value is just an int in disguise, there is no value
     ++ *       to free.  (Thus one fewer argument to strintmap_clear)
     ++ *    2) strintmap_get() returns an int; it also requires an extra parameter to
     ++ *       be specified so it knows what value to return if the underlying strmap
     ++ *       has not key matching the given string.
     ++ *    3) No strmap_put() equivalent; strintmap_set() and strintmap_incr()
     ++ *       instead.
      + */
      +
     -+static inline int strintmap_get(struct strmap *map, const char *str,
     -+				int default_value)
     ++struct strintmap {
     ++	struct strmap map;
     ++};
     ++
     ++#define strintmap_for_each_entry(mystrmap, iter, var)	\
     ++	strmap_for_each_entry(&(mystrmap)->map, iter, var)
     ++
     ++static inline void strintmap_init(struct strintmap *map)
      +{
     -+	struct string_list_item *result = strmap_get_item(map, str);
     -+	if (!result)
     -+		return default_value;
     -+	return (intptr_t)result->util;
     ++	strmap_init(&map->map);
     ++}
     ++
     ++static inline void strintmap_ocd_init(struct strintmap *map,
     ++				      int strdup_strings)
     ++{
     ++	strmap_ocd_init(&map->map, strdup_strings);
      +}
      +
     -+static inline void strintmap_set(struct strmap *map, const char *str, intptr_t v)
     ++static inline void strintmap_clear(struct strintmap *map)
      +{
     -+	strmap_put(map, str, (void *)v);
     ++	strmap_clear(&map->map, 0);
      +}
      +
     -+void strintmap_incr(struct strmap *map, const char *str, intptr_t amt);
     ++static inline void strintmap_partial_clear(struct strintmap *map)
     ++{
     ++	strmap_partial_clear(&map->map, 0);
     ++}
     ++
     ++static inline int strintmap_contains(struct strintmap *map, const char *str)
     ++{
     ++	return strmap_contains(&map->map, str);
     ++}
      +
     -+static inline void strintmap_clear(struct strmap *map)
     ++static inline void strintmap_remove(struct strintmap *map, const char *str)
      +{
     -+	strmap_clear(map, 0);
     ++	return strmap_remove(&map->map, str, 0);
      +}
      +
     -+static inline void strintmap_free(struct strmap *map)
     ++static inline int strintmap_empty(struct strintmap *map)
      +{
     -+	strmap_free(map, 0);
     ++	return strmap_empty(&map->map);
      +}
     ++
     ++static inline unsigned int strintmap_get_size(struct strintmap *map)
     ++{
     ++	return strmap_get_size(&map->map);
     ++}
     ++
     ++static inline int strintmap_get(struct strintmap *map, const char *str,
     ++				int default_value)
     ++{
     ++	struct strmap_entry *result = strmap_get_entry(&map->map, str);
     ++	if (!result)
     ++		return default_value;
     ++	return (intptr_t)result->value;
     ++}
     ++
     ++static inline void strintmap_set(struct strintmap *map, const char *str,
     ++				 intptr_t v)
     ++{
     ++	strmap_put(&map->map, str, (void *)v);
     ++}
     ++
     ++void strintmap_incr(struct strintmap *map, const char *str, intptr_t amt);
      +
       #endif /* STRMAP_H */
  -:  ---------- >  9:  490d3a42ad strmap: add a strset sub-type
  4:  b3095d97d8 ! 10:  eca4f1ddba strmap: add strdup_strings option
     @@ Metadata
      Author: Elijah Newren <newren@gmail.com>
      
       ## Commit message ##
     -    strmap: add strdup_strings option
     +    strmap: enable allocations to come from a mem_pool
      
     -    Just as it is sometimes useful for string_list to duplicate and take
     -    ownership of memory management of the strings it contains, the same is
     -    sometimes true for strmaps as well.  Add the same flag from string_list
     -    to strmap.
     +    For heavy users of strmaps, allowing the keys and entries to be
     +    allocated from a memory pool can provide significant overhead savings.
     +    Add an option to strmap_ocd_init() to specify a memory pool.
      
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
       ## strmap.c ##
     -@@ strmap.c: static struct str_entry *find_str_entry(struct strmap *map,
     - 	return hashmap_get_entry(&map->map, &entry, ent, NULL);
     +@@
     + #include "git-compat-util.h"
     + #include "strmap.h"
     ++#include "mem-pool.h"
     + 
     + static int cmp_strmap_entry(const void *hashmap_cmp_fn_data,
     + 			    const struct hashmap_entry *entry1,
     +@@ strmap.c: static struct strmap_entry *find_strmap_entry(struct strmap *map,
     + 
     + void strmap_init(struct strmap *map)
     + {
     +-	strmap_ocd_init(map, 1);
     ++	strmap_ocd_init(map, NULL, 1);
       }
       
     --void strmap_init(struct strmap *map)
     -+void strmap_init(struct strmap *map, int strdup_strings)
     + void strmap_ocd_init(struct strmap *map,
     ++		     struct mem_pool *pool,
     + 		     int strdup_strings)
       {
     - 	hashmap_init(&map->map, cmp_str_entry, NULL, 0);
     -+	map->strdup_strings = strdup_strings;
     + 	hashmap_init(&map->map, cmp_strmap_entry, NULL, 0);
     ++	map->pool = pool;
     + 	map->strdup_strings = strdup_strings;
       }
       
     - void strmap_free(struct strmap *map, int free_util)
     -@@ strmap.c: void strmap_free(struct strmap *map, int free_util)
     +@@ strmap.c: static void strmap_free_entries_(struct strmap *map, int free_util)
       	if (!map)
       		return;
       
     --	if (free_util) {
     -+	if (map->strdup_strings || free_util) {
     - 		hashmap_for_each_entry(&map->map, &iter, e, ent) {
     --			free(e->item.string);
     ++	if (!free_util && map->pool)
     ++		/* Memory other than util is owned by and freed with the pool */
     ++		return;
     ++
     + 	/*
     + 	 * We need to iterate over the hashmap entries and free
     + 	 * e->key and e->value ourselves; hashmap has no API to
     +@@ strmap.c: static void strmap_free_entries_(struct strmap *map, int free_util)
     + 	hashmap_for_each_entry(&map->map, &iter, e, ent) {
     + 		if (free_util)
     + 			free(e->value);
     +-		if (map->strdup_strings)
     +-			free((char*)e->key);
     +-		free(e);
     ++		if (!map->pool) {
      +			if (map->strdup_strings)
     -+				free(e->item.string);
     - 			if (free_util)
     - 				free(e->item.util);
     - 		}
     -@@ strmap.c: void strmap_free(struct strmap *map, int free_util)
     - void strmap_clear(struct strmap *map, int free_util)
     - {
     - 	strmap_free(map, free_util);
     --	strmap_init(map);
     -+	strmap_init(map, map->strdup_strings);
     ++				free((char*)e->key);
     ++			free(e);
     ++		}
     + 	}
       }
       
     - /*
     -- * Insert "str" into the map, pointing to "data". A copy of "str" is made, so
     -- * it does not need to persist after the this function is called.
     -+ * Insert "str" into the map, pointing to "data".
     -  *
     -  * If an entry for "str" already exists, its data pointer is overwritten, and
     -  * the original data pointer returned. Otherwise, returns NULL.
      @@ strmap.c: void *strmap_put(struct strmap *map, const char *str, void *data)
     - 	} else {
     - 		entry = xmalloc(sizeof(*entry));
     + 		 */
     + 		const char *key = str;
     + 
     +-		entry = xmalloc(sizeof(*entry));
     ++		entry = map->pool ? mem_pool_alloc(map->pool, sizeof(*entry))
     ++				  : xmalloc(sizeof(*entry));
       		hashmap_entry_init(&entry->ent, strhash(str));
     --		entry->item.string = strdup(str);
     -+		/*
     -+		 * We won't modify entry->item.string so it really should be
     -+		 * const, but changing string_list_item to use a const char *
     -+		 * is a bit too big of a change at this point.
     -+		 */
     -+		entry->item.string =
     -+			map->strdup_strings ? xstrdup(str) : (char *)str;
     - 		entry->item.util = data;
     + 
     + 		if (map->strdup_strings)
     +-			key = xstrdup(str);
     ++			key = map->pool ? mem_pool_strdup(map->pool, str)
     ++					: xstrdup(str);
     + 		entry->key = key;
     + 		entry->value = data;
       		hashmap_add(&map->map, &entry->ent);
     - 	}
      @@ strmap.c: void strmap_remove(struct strmap *map, const char *str, int free_util)
     - 	hashmap_entry_init(&entry.ent, strhash(str));
     - 	entry.item.string = (char *)str;
     - 	ret = hashmap_remove_entry(&map->map, &entry, ent, NULL);
     -+	if (map->strdup_strings)
     -+		free(ret->item.string);
     - 	if (ret && free_util)
     - 		free(ret->item.util);
     - 	free(ret);
     + 		return;
     + 	if (free_util)
     + 		free(ret->value);
     +-	if (map->strdup_strings)
     +-		free((char*)ret->key);
     +-	free(ret);
     ++	if (!map->pool) {
     ++		if (map->strdup_strings)
     ++			free((char*)ret->key);
     ++		free(ret);
     ++	}
     + }
     + 
     + void strintmap_incr(struct strintmap *map, const char *str, intptr_t amt)
      
       ## strmap.h ##
      @@
       
     + #include "hashmap.h"
     + 
     ++struct mempool;
       struct strmap {
       	struct hashmap map;
     -+	unsigned int strdup_strings:1;
     ++	struct mem_pool *pool;
     + 	unsigned int strdup_strings:1;
       };
       
     - struct str_entry {
     -@@ strmap.h: struct str_entry {
     - };
     +@@ strmap.h: void strmap_init(struct strmap *map);
       
       /*
     -- * Initialize an empty strmap
     -+ * Initialize the members of the strmap, set `strdup_strings`
     -+ * member according to the value of the second parameter.
     +  * Same as strmap_init, but for those who want to control the memory management
     +- * carefully instead of using the default of strdup_strings=1.
     ++ * carefully instead of using the default of strdup_strings=1 and pool=NULL.
     +  * (OCD = Obsessive Compulsive Disorder, a joke that those who use this function
     +  * are obsessing over minor details.)
        */
     --void strmap_init(struct strmap *map);
     -+void strmap_init(struct strmap *map, int strdup_strings);
     + void strmap_ocd_init(struct strmap *map,
     ++		     struct mem_pool *pool,
     + 		     int strdup_strings);
       
       /*
     -  * Remove all entries from the map, releasing any allocated resources.
     -@@ strmap.h: void strmap_free(struct strmap *map, int free_values);
     - void strmap_clear(struct strmap *map, int free_values);
     +@@ strmap.h: static inline void strintmap_init(struct strintmap *map)
     + }
       
     - /*
     -- * Insert "str" into the map, pointing to "data". A copy of "str" is made, so
     -- * it does not need to persist after the this function is called.
     -+ * Insert "str" into the map, pointing to "data".
     -  *
     -  * If an entry for "str" already exists, its data pointer is overwritten, and
     -  * the original data pointer returned. Otherwise, returns NULL.
     + static inline void strintmap_ocd_init(struct strintmap *map,
     ++				      struct mem_pool *pool,
     + 				      int strdup_strings)
     + {
     +-	strmap_ocd_init(&map->map, strdup_strings);
     ++	strmap_ocd_init(&map->map, pool, strdup_strings);
     + }
     + 
     + static inline void strintmap_clear(struct strintmap *map)
     +@@ strmap.h: static inline void strset_init(struct strset *set)
     + }
     + 
     + static inline void strset_ocd_init(struct strset *set,
     ++				   struct mem_pool *pool,
     + 				   int strdup_strings)
     + {
     +-	strmap_ocd_init(&set->map, strdup_strings);
     ++	strmap_ocd_init(&set->map, pool, strdup_strings);
     + }
     + 
     + static inline void strset_clear(struct strset *set)

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v2 01/10] hashmap: add usage documentation explaining hashmap_free[_entries]()
  2020-10-13  0:40 ` [PATCH v2 00/10] " Elijah Newren via GitGitGadget
@ 2020-10-13  0:40   ` Elijah Newren via GitGitGadget
  2020-10-30 12:50     ` Jeff King
  2020-10-13  0:40   ` [PATCH v2 02/10] hashmap: adjust spacing to fix argument alignment Elijah Newren via GitGitGadget
                     ` (9 subsequent siblings)
  10 siblings, 1 reply; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-10-13  0:40 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

The existence of hashmap_free() and hashmap_free_entries() confused me,
and the docs weren't clear enough.  We are dealing with a map table,
entries in that table, and possibly also things each of those entries
point to.  I had to consult other source code examples and the
implementation.  Add a brief note to clarify the differences.  This will
become even more important once we introduce a new
hashmap_partial_clear() function which will add the question of whether
the table itself has been freed.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 hashmap.h | 31 +++++++++++++++++++++++++++++--
 1 file changed, 29 insertions(+), 2 deletions(-)

diff --git a/hashmap.h b/hashmap.h
index b011b394fe..2994dc7a9c 100644
--- a/hashmap.h
+++ b/hashmap.h
@@ -236,13 +236,40 @@ void hashmap_init(struct hashmap *map,
 void hashmap_free_(struct hashmap *map, ssize_t offset);
 
 /*
- * Frees a hashmap structure and allocated memory, leaves entries undisturbed
+ * Frees a hashmap structure and allocated memory for the table, but does not
+ * free the entries nor anything they point to.
+ *
+ * Usage note:
+ *
+ * Many callers will need to iterate over all entries and free the data each
+ * entry points to; in such a case, they can free the entry itself while at it.
+ * Thus, you might see:
+ *
+ *    hashmap_for_each_entry(map, hashmap_iter, e, hashmap_entry_name) {
+ *      free(e->somefield);
+ *      free(e);
+ *    }
+ *    hashmap_free(map);
+ *
+ * instead of
+ *
+ *    hashmap_for_each_entry(map, hashmap_iter, e, hashmap_entry_name) {
+ *      free(e->somefield);
+ *    }
+ *    hashmap_free_entries(map, struct my_entry_struct, hashmap_entry_name);
+ *
+ * to avoid the implicit extra loop over the entries.  However, if there are
+ * no special fields in your entry that need to be freed beyond the entry
+ * itself, it is probably simpler to avoid the explicit loop and just call
+ * hashmap_free_entries().
  */
 #define hashmap_free(map) hashmap_free_(map, -1)
 
 /*
  * Frees @map and all entries.  @type is the struct type of the entry
- * where @member is the hashmap_entry struct used to associate with @map
+ * where @member is the hashmap_entry struct used to associate with @map.
+ *
+ * See usage note above hashmap_free().
  */
 #define hashmap_free_entries(map, type, member) \
 	hashmap_free_(map, offsetof(type, member));
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v2 02/10] hashmap: adjust spacing to fix argument alignment
  2020-10-13  0:40 ` [PATCH v2 00/10] " Elijah Newren via GitGitGadget
  2020-10-13  0:40   ` [PATCH v2 01/10] hashmap: add usage documentation explaining hashmap_free[_entries]() Elijah Newren via GitGitGadget
@ 2020-10-13  0:40   ` Elijah Newren via GitGitGadget
  2020-10-30 12:51     ` Jeff King
  2020-10-13  0:40   ` [PATCH v2 03/10] hashmap: allow re-use after hashmap_free() Elijah Newren via GitGitGadget
                     ` (8 subsequent siblings)
  10 siblings, 1 reply; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-10-13  0:40 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

No actual code changes; just whitespace adjustments.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 hashmap.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/hashmap.c b/hashmap.c
index 09813e1a46..e44d8a3e85 100644
--- a/hashmap.c
+++ b/hashmap.c
@@ -92,8 +92,9 @@ static void alloc_table(struct hashmap *map, unsigned int size)
 }
 
 static inline int entry_equals(const struct hashmap *map,
-		const struct hashmap_entry *e1, const struct hashmap_entry *e2,
-		const void *keydata)
+			       const struct hashmap_entry *e1,
+			       const struct hashmap_entry *e2,
+			       const void *keydata)
 {
 	return (e1 == e2) ||
 	       (e1->hash == e2->hash &&
@@ -101,7 +102,7 @@ static inline int entry_equals(const struct hashmap *map,
 }
 
 static inline unsigned int bucket(const struct hashmap *map,
-		const struct hashmap_entry *key)
+				  const struct hashmap_entry *key)
 {
 	return key->hash & (map->tablesize - 1);
 }
@@ -148,7 +149,7 @@ static int always_equal(const void *unused_cmp_data,
 }
 
 void hashmap_init(struct hashmap *map, hashmap_cmp_fn equals_function,
-		const void *cmpfn_data, size_t initial_size)
+		  const void *cmpfn_data, size_t initial_size)
 {
 	unsigned int size = HASHMAP_INITIAL_SIZE;
 
@@ -199,7 +200,7 @@ struct hashmap_entry *hashmap_get(const struct hashmap *map,
 }
 
 struct hashmap_entry *hashmap_get_next(const struct hashmap *map,
-			const struct hashmap_entry *entry)
+				       const struct hashmap_entry *entry)
 {
 	struct hashmap_entry *e = entry->next;
 	for (; e; e = e->next)
@@ -225,8 +226,8 @@ void hashmap_add(struct hashmap *map, struct hashmap_entry *entry)
 }
 
 struct hashmap_entry *hashmap_remove(struct hashmap *map,
-					const struct hashmap_entry *key,
-					const void *keydata)
+				     const struct hashmap_entry *key,
+				     const void *keydata)
 {
 	struct hashmap_entry *old;
 	struct hashmap_entry **e = find_entry_ptr(map, key, keydata);
@@ -249,7 +250,7 @@ struct hashmap_entry *hashmap_remove(struct hashmap *map,
 }
 
 struct hashmap_entry *hashmap_put(struct hashmap *map,
-				struct hashmap_entry *entry)
+				  struct hashmap_entry *entry)
 {
 	struct hashmap_entry *old = hashmap_remove(map, entry, NULL);
 	hashmap_add(map, entry);
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v2 03/10] hashmap: allow re-use after hashmap_free()
  2020-10-13  0:40 ` [PATCH v2 00/10] " Elijah Newren via GitGitGadget
  2020-10-13  0:40   ` [PATCH v2 01/10] hashmap: add usage documentation explaining hashmap_free[_entries]() Elijah Newren via GitGitGadget
  2020-10-13  0:40   ` [PATCH v2 02/10] hashmap: adjust spacing to fix argument alignment Elijah Newren via GitGitGadget
@ 2020-10-13  0:40   ` Elijah Newren via GitGitGadget
  2020-10-30 13:35     ` Jeff King
  2020-10-13  0:40   ` [PATCH v2 04/10] hashmap: introduce a new hashmap_partial_clear() Elijah Newren via GitGitGadget
                     ` (7 subsequent siblings)
  10 siblings, 1 reply; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-10-13  0:40 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Previously, once map->table had been freed, any calls to hashmap_put(),
hashmap_get(), or hashmap_remove() would cause a NULL pointer
dereference (since hashmap_free_() also zeros the memory; without that
zeroing, calling these functions would cause a use-after-free problem).

Modify these functions to check for a NULL table and automatically
allocate as needed.

I also thought about creating a HASHMAP_INIT macro to allow initializing
hashmaps on the stack without calling hashmap_init(), but virtually all
uses of hashmap specify a usecase-specific equals_function which defeats
the utility of such a macro.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 hashmap.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/hashmap.c b/hashmap.c
index e44d8a3e85..bb7c9979b8 100644
--- a/hashmap.c
+++ b/hashmap.c
@@ -114,6 +114,7 @@ int hashmap_bucket(const struct hashmap *map, unsigned int hash)
 
 static void rehash(struct hashmap *map, unsigned int newsize)
 {
+	/* map->table MUST NOT be NULL when this function is called */
 	unsigned int i, oldsize = map->tablesize;
 	struct hashmap_entry **oldtable = map->table;
 
@@ -134,6 +135,7 @@ static void rehash(struct hashmap *map, unsigned int newsize)
 static inline struct hashmap_entry **find_entry_ptr(const struct hashmap *map,
 		const struct hashmap_entry *key, const void *keydata)
 {
+	/* map->table MUST NOT be NULL when this function is called */
 	struct hashmap_entry **e = &map->table[bucket(map, key)];
 	while (*e && !entry_equals(map, *e, key, keydata))
 		e = &(*e)->next;
@@ -196,6 +198,8 @@ struct hashmap_entry *hashmap_get(const struct hashmap *map,
 				const struct hashmap_entry *key,
 				const void *keydata)
 {
+	if (!map->table)
+		return NULL;
 	return *find_entry_ptr(map, key, keydata);
 }
 
@@ -211,8 +215,12 @@ struct hashmap_entry *hashmap_get_next(const struct hashmap *map,
 
 void hashmap_add(struct hashmap *map, struct hashmap_entry *entry)
 {
-	unsigned int b = bucket(map, entry);
+	unsigned int b;
+
+	if (!map->table)
+		alloc_table(map, HASHMAP_INITIAL_SIZE);
 
+	b = bucket(map, entry);
 	/* add entry */
 	entry->next = map->table[b];
 	map->table[b] = entry;
@@ -230,7 +238,11 @@ struct hashmap_entry *hashmap_remove(struct hashmap *map,
 				     const void *keydata)
 {
 	struct hashmap_entry *old;
-	struct hashmap_entry **e = find_entry_ptr(map, key, keydata);
+	struct hashmap_entry **e;
+
+	if (!map->table)
+		return NULL;
+	e = find_entry_ptr(map, key, keydata);
 	if (!*e)
 		return NULL;
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v2 04/10] hashmap: introduce a new hashmap_partial_clear()
  2020-10-13  0:40 ` [PATCH v2 00/10] " Elijah Newren via GitGitGadget
                     ` (2 preceding siblings ...)
  2020-10-13  0:40   ` [PATCH v2 03/10] hashmap: allow re-use after hashmap_free() Elijah Newren via GitGitGadget
@ 2020-10-13  0:40   ` Elijah Newren via GitGitGadget
  2020-10-30 13:41     ` Jeff King
  2020-10-13  0:40   ` [PATCH v2 05/10] strmap: new utility functions Elijah Newren via GitGitGadget
                     ` (6 subsequent siblings)
  10 siblings, 1 reply; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-10-13  0:40 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

merge-ort is a heavy user of strmaps, which are built on hashmap.[ch].
reset_maps() in merge-ort was taking about 12% of overall runtime in my
testcase involving rebasing 35 patches of linux.git across a big rename.
reset_maps() was calling hashmap_free() followed by hashmap_init(),
meaning that not only was it freeing all the memory associated with each
of the strmaps just to immediately allocate a new array again, it was
allocating a new array that wasy likely smaller than needed (thus
resulting in later need to rehash things).  The ending size of the map
table on the previous commit was likely almost perfectly sized for the
next commit we wanted to pick, and not dropping and reallocating the
table immediately is a win.

Add some new API to hashmap to clear a hashmap of entries without
freeing map->table (and instead only zeroing it out like alloc_table()
would do, along with zeroing the count of items in the table and the
shrink_at field).

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 hashmap.c | 39 +++++++++++++++++++++++++++------------
 hashmap.h | 13 ++++++++++++-
 2 files changed, 39 insertions(+), 13 deletions(-)

diff --git a/hashmap.c b/hashmap.c
index bb7c9979b8..922ed07954 100644
--- a/hashmap.c
+++ b/hashmap.c
@@ -174,22 +174,37 @@ void hashmap_init(struct hashmap *map, hashmap_cmp_fn equals_function,
 	map->do_count_items = 1;
 }
 
+static void free_individual_entries(struct hashmap *map, ssize_t entry_offset)
+{
+	struct hashmap_iter iter;
+	struct hashmap_entry *e;
+
+	hashmap_iter_init(map, &iter);
+	while ((e = hashmap_iter_next(&iter)))
+		/*
+		 * like container_of, but using caller-calculated
+		 * offset (caller being hashmap_free_entries)
+		 */
+		free((char *)e - entry_offset);
+}
+
+void hashmap_partial_clear_(struct hashmap *map, ssize_t entry_offset)
+{
+	if (!map || !map->table)
+		return;
+	if (entry_offset >= 0)  /* called by hashmap_clear_entries */
+		free_individual_entries(map, entry_offset);
+	memset(map->table, 0, map->tablesize * sizeof(struct hashmap_entry *));
+	map->shrink_at = 0;
+	map->private_size = 0;
+}
+
 void hashmap_free_(struct hashmap *map, ssize_t entry_offset)
 {
 	if (!map || !map->table)
 		return;
-	if (entry_offset >= 0) { /* called by hashmap_free_entries */
-		struct hashmap_iter iter;
-		struct hashmap_entry *e;
-
-		hashmap_iter_init(map, &iter);
-		while ((e = hashmap_iter_next(&iter)))
-			/*
-			 * like container_of, but using caller-calculated
-			 * offset (caller being hashmap_free_entries)
-			 */
-			free((char *)e - entry_offset);
-	}
+	if (entry_offset >= 0)  /* called by hashmap_free_entries */
+		free_individual_entries(map, entry_offset);
 	free(map->table);
 	memset(map, 0, sizeof(*map));
 }
diff --git a/hashmap.h b/hashmap.h
index 2994dc7a9c..056a8cda32 100644
--- a/hashmap.h
+++ b/hashmap.h
@@ -232,7 +232,8 @@ void hashmap_init(struct hashmap *map,
 			 const void *equals_function_data,
 			 size_t initial_size);
 
-/* internal function for freeing hashmap */
+/* internal functions for clearing or freeing hashmap */
+void hashmap_partial_clear_(struct hashmap *map, ssize_t offset);
 void hashmap_free_(struct hashmap *map, ssize_t offset);
 
 /*
@@ -265,6 +266,16 @@ void hashmap_free_(struct hashmap *map, ssize_t offset);
  */
 #define hashmap_free(map) hashmap_free_(map, -1)
 
+/*
+ * Basically the same as calling hashmap_free() followed by hashmap_init(),
+ * but doesn't incur the overhead of deallocating and reallocating
+ * map->table; it leaves map->table allocated and the same size but zeroes
+ * it out so it's ready for use again as an empty map.  As with
+ * hashmap_free(), you may need to free the entries yourself before calling
+ * this function.
+ */
+#define hashmap_partial_clear(map) hashmap_partial_clear_(map, -1)
+
 /*
  * Frees @map and all entries.  @type is the struct type of the entry
  * where @member is the hashmap_entry struct used to associate with @map.
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v2 05/10] strmap: new utility functions
  2020-10-13  0:40 ` [PATCH v2 00/10] " Elijah Newren via GitGitGadget
                     ` (3 preceding siblings ...)
  2020-10-13  0:40   ` [PATCH v2 04/10] hashmap: introduce a new hashmap_partial_clear() Elijah Newren via GitGitGadget
@ 2020-10-13  0:40   ` Elijah Newren via GitGitGadget
  2020-10-30 14:12     ` Jeff King
  2020-10-13  0:40   ` [PATCH v2 06/10] strmap: add more " Elijah Newren via GitGitGadget
                     ` (5 subsequent siblings)
  10 siblings, 1 reply; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-10-13  0:40 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Add strmap as a new struct and associated utility functions,
specifically for hashmaps that map strings to some value.  The API is
taken directly from Peff's proposal at
https://lore.kernel.org/git/20180906191203.GA26184@sigill.intra.peff.net/

A couple of items of note:

  * Similar to string-list, I have a strdup_strings setting.  However,
    unlike string-list, strmap_init() does not take a parameter for this
    setting and instead automatically sets it to 1; callers who want to
    control this detail need to instead call strmap_ocd_init().

  * I do not have a STRMAP_INIT macro.  I could possibly add one, but
      #define STRMAP_INIT { { NULL, cmp_str_entry, NULL, 0, 0, 0, 0, 0 }, 1 }
    feels a bit unwieldy and possibly error-prone in terms of future
    expansion of the hashmap struct.  The fact that cmp_str_entry needs to
    be in there prevents us from passing all zeros for the hashmap, and makes
    me worry that STRMAP_INIT would just be more trouble than it is worth.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 Makefile |   1 +
 strmap.c | 102 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 strmap.h |  57 +++++++++++++++++++++++++++++++
 3 files changed, 160 insertions(+)
 create mode 100644 strmap.c
 create mode 100644 strmap.h

diff --git a/Makefile b/Makefile
index 95571ee3fc..777a34c01c 100644
--- a/Makefile
+++ b/Makefile
@@ -1000,6 +1000,7 @@ LIB_OBJS += stable-qsort.o
 LIB_OBJS += strbuf.o
 LIB_OBJS += streaming.o
 LIB_OBJS += string-list.o
+LIB_OBJS += strmap.o
 LIB_OBJS += strvec.o
 LIB_OBJS += sub-process.o
 LIB_OBJS += submodule-config.o
diff --git a/strmap.c b/strmap.c
new file mode 100644
index 0000000000..4b48d64274
--- /dev/null
+++ b/strmap.c
@@ -0,0 +1,102 @@
+#include "git-compat-util.h"
+#include "strmap.h"
+
+static int cmp_strmap_entry(const void *hashmap_cmp_fn_data,
+			    const struct hashmap_entry *entry1,
+			    const struct hashmap_entry *entry2,
+			    const void *keydata)
+{
+	const struct strmap_entry *e1, *e2;
+
+	e1 = container_of(entry1, const struct strmap_entry, ent);
+	e2 = container_of(entry2, const struct strmap_entry, ent);
+	return strcmp(e1->key, e2->key);
+}
+
+static struct strmap_entry *find_strmap_entry(struct strmap *map,
+					      const char *str)
+{
+	struct strmap_entry entry;
+	hashmap_entry_init(&entry.ent, strhash(str));
+	entry.key = str;
+	return hashmap_get_entry(&map->map, &entry, ent, NULL);
+}
+
+void strmap_init(struct strmap *map)
+{
+	strmap_ocd_init(map, 1);
+}
+
+void strmap_ocd_init(struct strmap *map,
+		     int strdup_strings)
+{
+	hashmap_init(&map->map, cmp_strmap_entry, NULL, 0);
+	map->strdup_strings = strdup_strings;
+}
+
+static void strmap_free_entries_(struct strmap *map, int free_util)
+{
+	struct hashmap_iter iter;
+	struct strmap_entry *e;
+
+	if (!map)
+		return;
+
+	/*
+	 * We need to iterate over the hashmap entries and free
+	 * e->key and e->value ourselves; hashmap has no API to
+	 * take care of that for us.  Since we're already iterating over
+	 * the hashmap, though, might as well free e too and avoid the need
+	 * to make some call into the hashmap API to do that.
+	 */
+	hashmap_for_each_entry(&map->map, &iter, e, ent) {
+		if (free_util)
+			free(e->value);
+		if (map->strdup_strings)
+			free((char*)e->key);
+		free(e);
+	}
+}
+
+void strmap_clear(struct strmap *map, int free_util)
+{
+	strmap_free_entries_(map, free_util);
+	hashmap_free(&map->map);
+}
+
+void *strmap_put(struct strmap *map, const char *str, void *data)
+{
+	struct strmap_entry *entry = find_strmap_entry(map, str);
+	void *old = NULL;
+
+	if (entry) {
+		old = entry->value;
+		entry->value = data;
+	} else {
+		/*
+		 * We won't modify entry->key so it really should be const.
+		 */
+		const char *key = str;
+
+		entry = xmalloc(sizeof(*entry));
+		hashmap_entry_init(&entry->ent, strhash(str));
+
+		if (map->strdup_strings)
+			key = xstrdup(str);
+		entry->key = key;
+		entry->value = data;
+		hashmap_add(&map->map, &entry->ent);
+	}
+	return old;
+}
+
+void *strmap_get(struct strmap *map, const char *str)
+{
+	struct strmap_entry *entry = find_strmap_entry(map, str);
+	return entry ? entry->value : NULL;
+}
+
+int strmap_contains(struct strmap *map, const char *str)
+{
+	return find_strmap_entry(map, str) != NULL;
+}
diff --git a/strmap.h b/strmap.h
new file mode 100644
index 0000000000..493d19cbc0
--- /dev/null
+++ b/strmap.h
@@ -0,0 +1,57 @@
+#ifndef STRMAP_H
+#define STRMAP_H
+
+#include "hashmap.h"
+
+struct strmap {
+	struct hashmap map;
+	unsigned int strdup_strings:1;
+};
+
+struct strmap_entry {
+	struct hashmap_entry ent;
+	const char *key;
+	void *value;
+};
+
+/*
+ * Initialize the members of the strmap.  Any keys added to the strmap will
+ * be strdup'ed with their memory managed by the strmap.
+ */
+void strmap_init(struct strmap *map);
+
+/*
+ * Same as strmap_init, but for those who want to control the memory management
+ * carefully instead of using the default of strdup_strings=1.
+ * (OCD = Obsessive Compulsive Disorder, a joke that those who use this function
+ * are obsessing over minor details.)
+ */
+void strmap_ocd_init(struct strmap *map,
+		     int strdup_strings);
+
+/*
+ * Remove all entries from the map, releasing any allocated resources.
+ */
+void strmap_clear(struct strmap *map, int free_values);
+
+/*
+ * Insert "str" into the map, pointing to "data".
+ *
+ * If an entry for "str" already exists, its data pointer is overwritten, and
+ * the original data pointer returned. Otherwise, returns NULL.
+ */
+void *strmap_put(struct strmap *map, const char *str, void *data);
+
+/*
+ * Return the data pointer mapped by "str", or NULL if the entry does not
+ * exist.
+ */
+void *strmap_get(struct strmap *map, const char *str);
+
+/*
+ * Return non-zero iff "str" is present in the map. This differs from
+ * strmap_get() in that it can distinguish entries with a NULL data pointer.
+ */
+int strmap_contains(struct strmap *map, const char *str);
+
+#endif /* STRMAP_H */
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v2 06/10] strmap: add more utility functions
  2020-10-13  0:40 ` [PATCH v2 00/10] " Elijah Newren via GitGitGadget
                     ` (4 preceding siblings ...)
  2020-10-13  0:40   ` [PATCH v2 05/10] strmap: new utility functions Elijah Newren via GitGitGadget
@ 2020-10-13  0:40   ` Elijah Newren via GitGitGadget
  2020-10-30 14:23     ` Jeff King
  2020-10-13  0:40   ` [PATCH v2 07/10] strmap: enable faster clearing and reusing of strmaps Elijah Newren via GitGitGadget
                     ` (4 subsequent siblings)
  10 siblings, 1 reply; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-10-13  0:40 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

This adds a number of additional convienence functions I want/need:
  * strmap_empty()
  * strmap_get_size()
  * strmap_remove()
  * strmap_for_each_entry()
  * strmap_get_entry()

I suspect the first four are self-explanatory.

strmap_get_entry() is similar to strmap_get() except that instead of just
returning the void* value that the string maps to, it returns the
strmap_entry that contains both the string and the void* value (or
NULL if the string isn't in the map).  This is helpful because it avoids
multiple lookups, e.g. in some cases a caller would need to call:
  * strmap_contains() to check that the map has an entry for the string
  * strmap_get() to get the void* value
  * <do some work to update the value>
  * strmap_put() to update/overwrite the value
If the void* pointer returned really is a pointer, then the last step is
unnecessary, but if the void* pointer is just cast to an integer then
strmap_put() will be needed.  In contrast, one can call strmap_get_entry()
and then:
  * check if the string was in the map by whether the pointer is NULL
  * access the value via entry->value
  * directly update entry->value
meaning that we can replace two or three hash table lookups with one.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 strmap.c | 20 ++++++++++++++++++++
 strmap.h | 38 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 58 insertions(+)

diff --git a/strmap.c b/strmap.c
index 4b48d64274..909b9fbedf 100644
--- a/strmap.c
+++ b/strmap.c
@@ -90,6 +90,11 @@ void *strmap_put(struct strmap *map, const char *str, void *data)
 	return old;
 }
 
+struct strmap_entry *strmap_get_entry(struct strmap *map, const char *str)
+{
+	return find_strmap_entry(map, str);
+}
+
 void *strmap_get(struct strmap *map, const char *str)
 {
 	struct strmap_entry *entry = find_strmap_entry(map, str);
@@ -100,3 +105,18 @@ int strmap_contains(struct strmap *map, const char *str)
 {
 	return find_strmap_entry(map, str) != NULL;
 }
+
+void strmap_remove(struct strmap *map, const char *str, int free_util)
+{
+	struct strmap_entry entry, *ret;
+	hashmap_entry_init(&entry.ent, strhash(str));
+	entry.key = str;
+	ret = hashmap_remove_entry(&map->map, &entry, ent, NULL);
+	if (!ret)
+		return;
+	if (free_util)
+		free(ret->value);
+	if (map->strdup_strings)
+		free((char*)ret->key);
+	free(ret);
+}
diff --git a/strmap.h b/strmap.h
index 493d19cbc0..e49d020970 100644
--- a/strmap.h
+++ b/strmap.h
@@ -42,6 +42,12 @@ void strmap_clear(struct strmap *map, int free_values);
  */
 void *strmap_put(struct strmap *map, const char *str, void *data);
 
+/*
+ * Return the strmap_entry mapped by "str", or NULL if there is not such
+ * an item in map.
+ */
+struct strmap_entry *strmap_get_entry(struct strmap *map, const char *str);
+
 /*
  * Return the data pointer mapped by "str", or NULL if the entry does not
  * exist.
@@ -54,4 +60,36 @@ void *strmap_get(struct strmap *map, const char *str);
  */
 int strmap_contains(struct strmap *map, const char *str);
 
+/*
+ * Remove the given entry from the strmap.  If the string isn't in the
+ * strmap, the map is not altered.
+ */
+void strmap_remove(struct strmap *map, const char *str, int free_value);
+
+/*
+ * Return whether the strmap is empty.
+ */
+static inline int strmap_empty(struct strmap *map)
+{
+	return hashmap_get_size(&map->map) == 0;
+}
+
+/*
+ * Return how many entries the strmap has.
+ */
+static inline unsigned int strmap_get_size(struct strmap *map)
+{
+	return hashmap_get_size(&map->map);
+}
+
+/*
+ * iterate through @map using @iter, @var is a pointer to a type strmap_entry
+ */
+#define strmap_for_each_entry(mystrmap, iter, var)	\
+	for (var = hashmap_iter_first_entry_offset(&(mystrmap)->map, iter, \
+						   OFFSETOF_VAR(var, ent)); \
+		var; \
+		var = hashmap_iter_next_entry_offset(iter, \
+						     OFFSETOF_VAR(var, ent)))
+
 #endif /* STRMAP_H */
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v2 07/10] strmap: enable faster clearing and reusing of strmaps
  2020-10-13  0:40 ` [PATCH v2 00/10] " Elijah Newren via GitGitGadget
                     ` (5 preceding siblings ...)
  2020-10-13  0:40   ` [PATCH v2 06/10] strmap: add more " Elijah Newren via GitGitGadget
@ 2020-10-13  0:40   ` Elijah Newren via GitGitGadget
  2020-10-30 14:27     ` Jeff King
  2020-10-13  0:40   ` [PATCH v2 08/10] strmap: add functions facilitating use as a string->int map Elijah Newren via GitGitGadget
                     ` (3 subsequent siblings)
  10 siblings, 1 reply; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-10-13  0:40 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

When strmaps are used heavily, such as is done by my new merge-ort
algorithm, and strmaps need to be cleared but then re-used (because of
e.g. picking multiple commits to cherry-pick, or due to a recursive
merge having several different merges while recursing), free-ing and
reallocating map->table repeatedly can add up in time, especially since
it will likely be reallocated to a much smaller size but the previous
merge provides a good guide to the right size to use for the next merge.

Introduce strmap_partial_clear() to take advantage of this type of
situation; it will act similar to strmap_clear() except that
map->table's entries are zeroed instead of map->table being free'd.
Making use of this function reduced the cost of reset_maps() by about
20% in mert-ort, and dropped the overall runtime of my rebase testcase
by just under 2%.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 strmap.c | 6 ++++++
 strmap.h | 6 ++++++
 2 files changed, 12 insertions(+)

diff --git a/strmap.c b/strmap.c
index 909b9fbedf..47cbf11ec7 100644
--- a/strmap.c
+++ b/strmap.c
@@ -64,6 +64,12 @@ void strmap_clear(struct strmap *map, int free_util)
 	hashmap_free(&map->map);
 }
 
+void strmap_partial_clear(struct strmap *map, int free_util)
+{
+	strmap_free_entries_(map, free_util);
+	hashmap_partial_clear(&map->map);
+}
+
 void *strmap_put(struct strmap *map, const char *str, void *data)
 {
 	struct strmap_entry *entry = find_strmap_entry(map, str);
diff --git a/strmap.h b/strmap.h
index e49d020970..5bb7650d65 100644
--- a/strmap.h
+++ b/strmap.h
@@ -34,6 +34,12 @@ void strmap_ocd_init(struct strmap *map,
  */
 void strmap_clear(struct strmap *map, int free_values);
 
+/*
+ * Similar to strmap_clear() but leaves map->map->table allocated and
+ * pre-sized so that subsequent uses won't need as many rehashings.
+ */
+void strmap_partial_clear(struct strmap *map, int free_values);
+
 /*
  * Insert "str" into the map, pointing to "data".
  *
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v2 08/10] strmap: add functions facilitating use as a string->int map
  2020-10-13  0:40 ` [PATCH v2 00/10] " Elijah Newren via GitGitGadget
                     ` (6 preceding siblings ...)
  2020-10-13  0:40   ` [PATCH v2 07/10] strmap: enable faster clearing and reusing of strmaps Elijah Newren via GitGitGadget
@ 2020-10-13  0:40   ` Elijah Newren via GitGitGadget
  2020-10-30 14:39     ` Jeff King
  2020-10-13  0:40   ` [PATCH v2 09/10] strmap: add a strset sub-type Elijah Newren via GitGitGadget
                     ` (2 subsequent siblings)
  10 siblings, 1 reply; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-10-13  0:40 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Although strmap could be used as a string->int map, one either had to
allocate an int for every entry and then deallocate later, or one had to
do a bunch of casting between (void*) and (intptr_t).

Add some special functions that do the casting.  Also, rename put->set
for such wrapper functions since 'put' implied there may be some
deallocation needed if the string was already found in the map, which
isn't the case when we're storing an int value directly in the void*
slot instead of using the void* slot as a pointer to data.

A note on the name: if anyone has a better name suggestion than
strintmap, I'm happy to take it.  It seems slightly unwieldy, but I have
not been able to come up with a better name.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 strmap.c | 11 ++++++++
 strmap.h | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 91 insertions(+)

diff --git a/strmap.c b/strmap.c
index 47cbf11ec7..d5003a79e3 100644
--- a/strmap.c
+++ b/strmap.c
@@ -126,3 +126,14 @@ void strmap_remove(struct strmap *map, const char *str, int free_util)
 		free((char*)ret->key);
 	free(ret);
 }
+
+void strintmap_incr(struct strintmap *map, const char *str, intptr_t amt)
+{
+	struct strmap_entry *entry = find_strmap_entry(&map->map, str);
+	if (entry) {
+		intptr_t *whence = (intptr_t*)&entry->value;
+		*whence += amt;
+	}
+	else
+		strintmap_set(map, str, amt);
+}
diff --git a/strmap.h b/strmap.h
index 5bb7650d65..fe15e74b78 100644
--- a/strmap.h
+++ b/strmap.h
@@ -98,4 +98,84 @@ static inline unsigned int strmap_get_size(struct strmap *map)
 		var = hashmap_iter_next_entry_offset(iter, \
 						     OFFSETOF_VAR(var, ent)))
 
+
+/*
+ * strintmap:
+ *    A map of string -> int, typecasting the void* of strmap to an int.
+ *
+ * Primary differences:
+ *    1) Since the void* value is just an int in disguise, there is no value
+ *       to free.  (Thus one fewer argument to strintmap_clear)
+ *    2) strintmap_get() returns an int; it also requires an extra parameter to
+ *       be specified so it knows what value to return if the underlying strmap
+ *       has not key matching the given string.
+ *    3) No strmap_put() equivalent; strintmap_set() and strintmap_incr()
+ *       instead.
+ */
+
+struct strintmap {
+	struct strmap map;
+};
+
+#define strintmap_for_each_entry(mystrmap, iter, var)	\
+	strmap_for_each_entry(&(mystrmap)->map, iter, var)
+
+static inline void strintmap_init(struct strintmap *map)
+{
+	strmap_init(&map->map);
+}
+
+static inline void strintmap_ocd_init(struct strintmap *map,
+				      int strdup_strings)
+{
+	strmap_ocd_init(&map->map, strdup_strings);
+}
+
+static inline void strintmap_clear(struct strintmap *map)
+{
+	strmap_clear(&map->map, 0);
+}
+
+static inline void strintmap_partial_clear(struct strintmap *map)
+{
+	strmap_partial_clear(&map->map, 0);
+}
+
+static inline int strintmap_contains(struct strintmap *map, const char *str)
+{
+	return strmap_contains(&map->map, str);
+}
+
+static inline void strintmap_remove(struct strintmap *map, const char *str)
+{
+	return strmap_remove(&map->map, str, 0);
+}
+
+static inline int strintmap_empty(struct strintmap *map)
+{
+	return strmap_empty(&map->map);
+}
+
+static inline unsigned int strintmap_get_size(struct strintmap *map)
+{
+	return strmap_get_size(&map->map);
+}
+
+static inline int strintmap_get(struct strintmap *map, const char *str,
+				int default_value)
+{
+	struct strmap_entry *result = strmap_get_entry(&map->map, str);
+	if (!result)
+		return default_value;
+	return (intptr_t)result->value;
+}
+
+static inline void strintmap_set(struct strintmap *map, const char *str,
+				 intptr_t v)
+{
+	strmap_put(&map->map, str, (void *)v);
+}
+
+void strintmap_incr(struct strintmap *map, const char *str, intptr_t amt);
+
 #endif /* STRMAP_H */
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v2 09/10] strmap: add a strset sub-type
  2020-10-13  0:40 ` [PATCH v2 00/10] " Elijah Newren via GitGitGadget
                     ` (7 preceding siblings ...)
  2020-10-13  0:40   ` [PATCH v2 08/10] strmap: add functions facilitating use as a string->int map Elijah Newren via GitGitGadget
@ 2020-10-13  0:40   ` Elijah Newren via GitGitGadget
  2020-10-30 14:44     ` Jeff King
  2020-10-13  0:40   ` [PATCH v2 10/10] strmap: enable allocations to come from a mem_pool Elijah Newren via GitGitGadget
  2020-11-02 18:55   ` [PATCH v3 00/13] Add struct strmap and associated utility functions Elijah Newren via GitGitGadget
  10 siblings, 1 reply; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-10-13  0:40 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Similar to adding strintmap for special-casing a string -> int mapping,
add a strset type for cases where we really are only interested in using
strmap for storing a set rather than a mapping.  In this case, we'll
always just store NULL for the value but the different struct type makes
it clearer than code comments how a variable is intended to be used.

The difference in usage also results in some differences in API: a few
things that aren't necessary or meaningful are dropped (namely, the
free_util argument to *_clear(), and the *_get() function), and
strset_add() is chosen as the API instead of strset_put().

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 strmap.h | 64 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 64 insertions(+)

diff --git a/strmap.h b/strmap.h
index fe15e74b78..2ad6696950 100644
--- a/strmap.h
+++ b/strmap.h
@@ -178,4 +178,68 @@ static inline void strintmap_set(struct strintmap *map, const char *str,
 
 void strintmap_incr(struct strintmap *map, const char *str, intptr_t amt);
 
+/*
+ * strset:
+ *    A set of strings.
+ *
+ * Primary differences with strmap:
+ *    1) The value is always NULL, and ignored.  As there is no value to free,
+ *       there is one fewer argument to strset_clear
+ *    2) No strset_get() because there is no value.
+ *    3) No strset_put(); use strset_add() instead.
+ */
+
+struct strset {
+	struct strmap map;
+};
+
+#define strset_for_each_entry(mystrset, iter, var)	\
+	strmap_for_each_entry(&(mystrset)->map, iter, var)
+
+static inline void strset_init(struct strset *set)
+{
+	strmap_init(&set->map);
+}
+
+static inline void strset_ocd_init(struct strset *set,
+				   int strdup_strings)
+{
+	strmap_ocd_init(&set->map, strdup_strings);
+}
+
+static inline void strset_clear(struct strset *set)
+{
+	strmap_clear(&set->map, 0);
+}
+
+static inline void strset_partial_clear(struct strset *set)
+{
+	strmap_partial_clear(&set->map, 0);
+}
+
+static inline int strset_contains(struct strset *set, const char *str)
+{
+	return strmap_contains(&set->map, str);
+}
+
+static inline void strset_remove(struct strset *set, const char *str)
+{
+	return strmap_remove(&set->map, str, 0);
+}
+
+static inline int strset_empty(struct strset *set)
+{
+	return strmap_empty(&set->map);
+}
+
+static inline unsigned int strset_get_size(struct strset *set)
+{
+	return strmap_get_size(&set->map);
+}
+
+static inline void strset_add(struct strset *set, const char *str)
+{
+	strmap_put(&set->map, str, NULL);
+}
+
 #endif /* STRMAP_H */
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v2 10/10] strmap: enable allocations to come from a mem_pool
  2020-10-13  0:40 ` [PATCH v2 00/10] " Elijah Newren via GitGitGadget
                     ` (8 preceding siblings ...)
  2020-10-13  0:40   ` [PATCH v2 09/10] strmap: add a strset sub-type Elijah Newren via GitGitGadget
@ 2020-10-13  0:40   ` Elijah Newren via GitGitGadget
  2020-10-30 14:56     ` Jeff King
  2020-11-02 18:55   ` [PATCH v3 00/13] Add struct strmap and associated utility functions Elijah Newren via GitGitGadget
  10 siblings, 1 reply; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-10-13  0:40 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

For heavy users of strmaps, allowing the keys and entries to be
allocated from a memory pool can provide significant overhead savings.
Add an option to strmap_ocd_init() to specify a memory pool.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 strmap.c | 31 ++++++++++++++++++++++---------
 strmap.h | 11 ++++++++---
 2 files changed, 30 insertions(+), 12 deletions(-)

diff --git a/strmap.c b/strmap.c
index d5003a79e3..83b9de961c 100644
--- a/strmap.c
+++ b/strmap.c
@@ -1,5 +1,6 @@
 #include "git-compat-util.h"
 #include "strmap.h"
+#include "mem-pool.h"
 
 static int cmp_strmap_entry(const void *hashmap_cmp_fn_data,
 			    const struct hashmap_entry *entry1,
@@ -24,13 +25,15 @@ static struct strmap_entry *find_strmap_entry(struct strmap *map,
 
 void strmap_init(struct strmap *map)
 {
-	strmap_ocd_init(map, 1);
+	strmap_ocd_init(map, NULL, 1);
 }
 
 void strmap_ocd_init(struct strmap *map,
+		     struct mem_pool *pool,
 		     int strdup_strings)
 {
 	hashmap_init(&map->map, cmp_strmap_entry, NULL, 0);
+	map->pool = pool;
 	map->strdup_strings = strdup_strings;
 }
 
@@ -42,6 +45,10 @@ static void strmap_free_entries_(struct strmap *map, int free_util)
 	if (!map)
 		return;
 
+	if (!free_util && map->pool)
+		/* Memory other than util is owned by and freed with the pool */
+		return;
+
 	/*
 	 * We need to iterate over the hashmap entries and free
 	 * e->key and e->value ourselves; hashmap has no API to
@@ -52,9 +59,11 @@ static void strmap_free_entries_(struct strmap *map, int free_util)
 	hashmap_for_each_entry(&map->map, &iter, e, ent) {
 		if (free_util)
 			free(e->value);
-		if (map->strdup_strings)
-			free((char*)e->key);
-		free(e);
+		if (!map->pool) {
+			if (map->strdup_strings)
+				free((char*)e->key);
+			free(e);
+		}
 	}
 }
 
@@ -84,11 +93,13 @@ void *strmap_put(struct strmap *map, const char *str, void *data)
 		 */
 		const char *key = str;
 
-		entry = xmalloc(sizeof(*entry));
+		entry = map->pool ? mem_pool_alloc(map->pool, sizeof(*entry))
+				  : xmalloc(sizeof(*entry));
 		hashmap_entry_init(&entry->ent, strhash(str));
 
 		if (map->strdup_strings)
-			key = xstrdup(str);
+			key = map->pool ? mem_pool_strdup(map->pool, str)
+					: xstrdup(str);
 		entry->key = key;
 		entry->value = data;
 		hashmap_add(&map->map, &entry->ent);
@@ -122,9 +133,11 @@ void strmap_remove(struct strmap *map, const char *str, int free_util)
 		return;
 	if (free_util)
 		free(ret->value);
-	if (map->strdup_strings)
-		free((char*)ret->key);
-	free(ret);
+	if (!map->pool) {
+		if (map->strdup_strings)
+			free((char*)ret->key);
+		free(ret);
+	}
 }
 
 void strintmap_incr(struct strintmap *map, const char *str, intptr_t amt)
diff --git a/strmap.h b/strmap.h
index 2ad6696950..b93b7c9fd6 100644
--- a/strmap.h
+++ b/strmap.h
@@ -3,8 +3,10 @@
 
 #include "hashmap.h"
 
+struct mempool;
 struct strmap {
 	struct hashmap map;
+	struct mem_pool *pool;
 	unsigned int strdup_strings:1;
 };
 
@@ -22,11 +24,12 @@ void strmap_init(struct strmap *map);
 
 /*
  * Same as strmap_init, but for those who want to control the memory management
- * carefully instead of using the default of strdup_strings=1.
+ * carefully instead of using the default of strdup_strings=1 and pool=NULL.
  * (OCD = Obsessive Compulsive Disorder, a joke that those who use this function
  * are obsessing over minor details.)
  */
 void strmap_ocd_init(struct strmap *map,
+		     struct mem_pool *pool,
 		     int strdup_strings);
 
 /*
@@ -126,9 +129,10 @@ static inline void strintmap_init(struct strintmap *map)
 }
 
 static inline void strintmap_ocd_init(struct strintmap *map,
+				      struct mem_pool *pool,
 				      int strdup_strings)
 {
-	strmap_ocd_init(&map->map, strdup_strings);
+	strmap_ocd_init(&map->map, pool, strdup_strings);
 }
 
 static inline void strintmap_clear(struct strintmap *map)
@@ -202,9 +206,10 @@ static inline void strset_init(struct strset *set)
 }
 
 static inline void strset_ocd_init(struct strset *set,
+				   struct mem_pool *pool,
 				   int strdup_strings)
 {
-	strmap_ocd_init(&set->map, strdup_strings);
+	strmap_ocd_init(&set->map, pool, strdup_strings);
 }
 
 static inline void strset_clear(struct strset *set)
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v2 01/10] hashmap: add usage documentation explaining hashmap_free[_entries]()
  2020-10-13  0:40   ` [PATCH v2 01/10] hashmap: add usage documentation explaining hashmap_free[_entries]() Elijah Newren via GitGitGadget
@ 2020-10-30 12:50     ` Jeff King
  2020-10-30 19:55       ` Elijah Newren
  0 siblings, 1 reply; 144+ messages in thread
From: Jeff King @ 2020-10-30 12:50 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget; +Cc: git, Elijah Newren

On Tue, Oct 13, 2020 at 12:40:41AM +0000, Elijah Newren via GitGitGadget wrote:

> The existence of hashmap_free() and hashmap_free_entries() confused me,
> and the docs weren't clear enough.  We are dealing with a map table,
> entries in that table, and possibly also things each of those entries
> point to.  I had to consult other source code examples and the
> implementation.  Add a brief note to clarify the differences.  This will
> become even more important once we introduce a new
> hashmap_partial_clear() function which will add the question of whether
> the table itself has been freed.

This is a definite improvement, and I don't see any inaccuracies in the
descriptions. I do think some re-naming would help in the long run,
though. E.g.:

  - hashmap_clear() - remove all entries and de-allocate any
    hashmap-specific data, but be ready for reuse

  - hashmap_clear_and_free() - ditto, but free the entries themselves

  - hashmap_partial_clear() - remove all entries but don't deallocate
    table

  - hashmap_partial_clear_and_free() - ditto, but free the entries

So always call it "clear", but allow options in two dimensions (partial
or not, free entries or not).

Those could be parameters to a single function, but I think it gets a
little ugly because "and_free" requires passing in the type of the
entries in order to find the pointers.

The "not" cases are implied in the names, but hashmap_clear_full() would
be OK with me, too.

But I think in the current scheme that "free" is somewhat overloaded,
and if we end with a "clear" and a "free" that seems confusing to me.

-Peff

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v2 02/10] hashmap: adjust spacing to fix argument alignment
  2020-10-13  0:40   ` [PATCH v2 02/10] hashmap: adjust spacing to fix argument alignment Elijah Newren via GitGitGadget
@ 2020-10-30 12:51     ` Jeff King
  0 siblings, 0 replies; 144+ messages in thread
From: Jeff King @ 2020-10-30 12:51 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget; +Cc: git, Elijah Newren

On Tue, Oct 13, 2020 at 12:40:42AM +0000, Elijah Newren via GitGitGadget wrote:

> From: Elijah Newren <newren@gmail.com>
> 
> No actual code changes; just whitespace adjustments.

Obviously good. Thanks for splitting this into its own patch.

-Peff

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v2 03/10] hashmap: allow re-use after hashmap_free()
  2020-10-13  0:40   ` [PATCH v2 03/10] hashmap: allow re-use after hashmap_free() Elijah Newren via GitGitGadget
@ 2020-10-30 13:35     ` Jeff King
  2020-10-30 15:37       ` Elijah Newren
  0 siblings, 1 reply; 144+ messages in thread
From: Jeff King @ 2020-10-30 13:35 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget; +Cc: git, Elijah Newren

On Tue, Oct 13, 2020 at 12:40:43AM +0000, Elijah Newren via GitGitGadget wrote:

> Previously, once map->table had been freed, any calls to hashmap_put(),
> hashmap_get(), or hashmap_remove() would cause a NULL pointer
> dereference (since hashmap_free_() also zeros the memory; without that
> zeroing, calling these functions would cause a use-after-free problem).
> 
> Modify these functions to check for a NULL table and automatically
> allocate as needed.

Unsurprisingly, I like this direction. The code looks correct to me,
though I think you could reduce duplication slightly by checking
map->table in find_entry_ptr(). That covers both hashmap_get() and
hashmap_remove(). But I'm happy either way.

> I also thought about creating a HASHMAP_INIT macro to allow initializing
> hashmaps on the stack without calling hashmap_init(), but virtually all
> uses of hashmap specify a usecase-specific equals_function which defeats
> the utility of such a macro.

This part I disagree with. If we did:

  #define HASHMAP_INIT(fn, data) = { .cmpfn = cmpfn, cmpfn_data = data }

then many callers could avoid handling the lazy-init themselves. E.g.:

 attr.c | 16 +++-------------
 1 file changed, 3 insertions(+), 13 deletions(-)

diff --git a/attr.c b/attr.c
index a826b2ef1f..55a2783f1b 100644
--- a/attr.c
+++ b/attr.c
@@ -57,7 +57,9 @@ static inline void hashmap_unlock(struct attr_hashmap *map)
  * is a singleton object which is shared between threads.
  * Access to this dictionary must be surrounded with a mutex.
  */
-static struct attr_hashmap g_attr_hashmap;
+static struct attr_hashmap g_attr_hashmap = {
+	HASHMAP_INIT(attr_hash_entry_cmp, NULL)
+};
 
 /* The container for objects stored in "struct attr_hashmap" */
 struct attr_hash_entry {
@@ -80,12 +82,6 @@ static int attr_hash_entry_cmp(const void *unused_cmp_data,
 	return (a->keylen != b->keylen) || strncmp(a->key, b->key, a->keylen);
 }
 
-/* Initialize an 'attr_hashmap' object */
-static void attr_hashmap_init(struct attr_hashmap *map)
-{
-	hashmap_init(&map->map, attr_hash_entry_cmp, NULL, 0);
-}
-
 /*
  * Retrieve the 'value' stored in a hashmap given the provided 'key'.
  * If there is no matching entry, return NULL.
@@ -96,9 +92,6 @@ static void *attr_hashmap_get(struct attr_hashmap *map,
 	struct attr_hash_entry k;
 	struct attr_hash_entry *e;
 
-	if (!map->map.tablesize)
-		attr_hashmap_init(map);
-
 	hashmap_entry_init(&k.ent, memhash(key, keylen));
 	k.key = key;
 	k.keylen = keylen;
@@ -114,9 +107,6 @@ static void attr_hashmap_add(struct attr_hashmap *map,
 {
 	struct attr_hash_entry *e;
 
-	if (!map->map.tablesize)
-		attr_hashmap_init(map);
-
 	e = xmalloc(sizeof(struct attr_hash_entry));
 	hashmap_entry_init(&e->ent, memhash(key, keylen));
 	e->key = key;

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v2 04/10] hashmap: introduce a new hashmap_partial_clear()
  2020-10-13  0:40   ` [PATCH v2 04/10] hashmap: introduce a new hashmap_partial_clear() Elijah Newren via GitGitGadget
@ 2020-10-30 13:41     ` Jeff King
  2020-10-30 16:03       ` Elijah Newren
  0 siblings, 1 reply; 144+ messages in thread
From: Jeff King @ 2020-10-30 13:41 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget; +Cc: git, Elijah Newren

On Tue, Oct 13, 2020 at 12:40:44AM +0000, Elijah Newren via GitGitGadget wrote:

> merge-ort is a heavy user of strmaps, which are built on hashmap.[ch].
> reset_maps() in merge-ort was taking about 12% of overall runtime in my
> testcase involving rebasing 35 patches of linux.git across a big rename.
> reset_maps() was calling hashmap_free() followed by hashmap_init(),
> meaning that not only was it freeing all the memory associated with each
> of the strmaps just to immediately allocate a new array again, it was
> allocating a new array that wasy likely smaller than needed (thus

s/wasy/was/

> resulting in later need to rehash things).  The ending size of the map
> table on the previous commit was likely almost perfectly sized for the
> next commit we wanted to pick, and not dropping and reallocating the
> table immediately is a win.
> 
> Add some new API to hashmap to clear a hashmap of entries without
> freeing map->table (and instead only zeroing it out like alloc_table()
> would do, along with zeroing the count of items in the table and the
> shrink_at field).

This seems like a reasonable optimization to make, and doesn't make the
API significantly more complicated. I'd expect the allocation of actual
entry objects to dwarf the table allocation, but I guess:

  - you'll deal with the individual entries later using a mempool

  - it's not just the allocation, but the re-insertion of the entries as
    we grow

It would be nice if we had some actual perf numbers to report here, so
we could know exactly how much it was buying us. But I guess things are
a bit out-of-order there. You want to do this series first and then
build merge-ort on top as a user. We could introduce the basic data
structure first, then merge-ort, and then start applying optimizations
with real-world measurements. But I'm not sure it's worth the amount of
time you'd have to spend to reorganize in that way.

>  hashmap.c | 39 +++++++++++++++++++++++++++------------
>  hashmap.h | 13 ++++++++++++-

The implementation itself looks correct to me. I already mentioned my
thoughts on naming in patch 1.

-Peff

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v2 05/10] strmap: new utility functions
  2020-10-13  0:40   ` [PATCH v2 05/10] strmap: new utility functions Elijah Newren via GitGitGadget
@ 2020-10-30 14:12     ` Jeff King
  2020-10-30 16:26       ` Elijah Newren
  0 siblings, 1 reply; 144+ messages in thread
From: Jeff King @ 2020-10-30 14:12 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget; +Cc: git, Elijah Newren

On Tue, Oct 13, 2020 at 12:40:45AM +0000, Elijah Newren via GitGitGadget wrote:

> Add strmap as a new struct and associated utility functions,
> specifically for hashmaps that map strings to some value.  The API is
> taken directly from Peff's proposal at
> https://lore.kernel.org/git/20180906191203.GA26184@sigill.intra.peff.net/

This looks overall sane to me. I mentioned elsewhere that we could be
using FLEXPTR_ALLOC to save an extra allocation. I think it's easy and
worth doing here, as the logic would be completely contained within
strmap_put():

  if (strdup_strings)
	FLEXPTR_ALLOC_STR(entry, key, str);
  else {
	entry = xmalloc(sizeof(*entry));
	entry->key = str;
  }

And free_entries() then doesn't even have to care about strdup_strings.

> A couple of items of note:
> 
>   * Similar to string-list, I have a strdup_strings setting.  However,
>     unlike string-list, strmap_init() does not take a parameter for this
>     setting and instead automatically sets it to 1; callers who want to
>     control this detail need to instead call strmap_ocd_init().

That seems reasonable. It could just be a parameter, but I like that you
push people in the direction of doing the simple and safe thing, rather
than having them wonder whether they ought to set strdup_strings or not.

>   * I do not have a STRMAP_INIT macro.  I could possibly add one, but
>       #define STRMAP_INIT { { NULL, cmp_str_entry, NULL, 0, 0, 0, 0, 0 }, 1 }
>     feels a bit unwieldy and possibly error-prone in terms of future
>     expansion of the hashmap struct.  The fact that cmp_str_entry needs to
>     be in there prevents us from passing all zeros for the hashmap, and makes
>     me worry that STRMAP_INIT would just be more trouble than it is worth.

You can actually omit everything after cmp_str_entry, and those fields
would all get zero-initialized. But we also allow C99 designed
initializers these days. Coupled with the HASHMAP_INIT() I mentioned in
the earlier email, you'd have:

  #define STRMAP_INIT { \
		.map = HASHMAP_INIT(cmp_strmap_entry, NULL), \
		.strdup_strings = 1, \
	  }

which seems pretty maintainable.

> +static int cmp_strmap_entry(const void *hashmap_cmp_fn_data,
> +			    const struct hashmap_entry *entry1,
> +			    const struct hashmap_entry *entry2,
> +			    const void *keydata)
> +{
> +	const struct strmap_entry *e1, *e2;
> +
> +	e1 = container_of(entry1, const struct strmap_entry, ent);
> +	e2 = container_of(entry2, const struct strmap_entry, ent);
> +	return strcmp(e1->key, e2->key);
> +}

I expected to use keydata here, but it's pretty easy to make a fake
strmap_entry because of the use of the "key" pointer. So that makes
sense.

> +static void strmap_free_entries_(struct strmap *map, int free_util)

You use the term "value" for the mapped-to value in this iteration. So
perhaps free_values here (and in other functions) would be a better
name?

> +	/*
> +	 * We need to iterate over the hashmap entries and free
> +	 * e->key and e->value ourselves; hashmap has no API to
> +	 * take care of that for us.  Since we're already iterating over
> +	 * the hashmap, though, might as well free e too and avoid the need
> +	 * to make some call into the hashmap API to do that.
> +	 */
> +	hashmap_for_each_entry(&map->map, &iter, e, ent) {
> +		if (free_util)
> +			free(e->value);
> +		if (map->strdup_strings)
> +			free((char*)e->key);
> +		free(e);
> +	}
> +}

Yep, makes sense.

> +void strmap_clear(struct strmap *map, int free_util)
> +{
> +	strmap_free_entries_(map, free_util);
> +	hashmap_free(&map->map);
> +}

This made me wonder about a partial_clear(), but it looks like that
comes later.

> +void *strmap_put(struct strmap *map, const char *str, void *data)
> +{
> +	struct strmap_entry *entry = find_strmap_entry(map, str);
> +	void *old = NULL;
> +
> +	if (entry) {
> +		old = entry->value;
> +		entry->value = data;

Here's a weird hypothetical. If strdup_strings is not set and I do:

  const char *one = xstrdup("foo");
  const char *two = xstrdup("foo");

  hashmap_put(map, one, x);
  hashmap_put(map, two, y);

it's clear that the value should be pointing to "y" afterwards (and you
return "x" so the caller can free it or whatever, good).

But which key should the entry be pointing to? The old one or the new
one? I'm trying and failing to think of a case where it would matter.
Certainly I could add a free() to the toy above where it would, but it
feels like a real caller would have to have pretty convoluted memory
lifetime semantics for it to make a difference.

So I'm not really asking for a particular behavior, but just bringing it
up in case you can think of something relevant.

> +	} else {
> +		/*
> +		 * We won't modify entry->key so it really should be const.
> +		 */
> +		const char *key = str;

The "should be" here confused me. It _is_ const. I'd probably just
delete the comment entirely, but perhaps:

  /*
   * We'll store a const pointer. For non-duplicated strings, they belong
   * to the caller and we received them as const in the first place. For
   * our duplicated ones, they do point to memory we own, but they're
   * still conceptually constant within the lifetime of an entry.
   */

Though it might make more sense in the struct definition, not here.

> +void *strmap_get(struct strmap *map, const char *str)
> +{
> +	struct strmap_entry *entry = find_strmap_entry(map, str);
> +	return entry ? entry->value : NULL;
> +}

Just noting that the caller can't tell the difference between "no such
entry" and "the entry is storing NULL". I think the simplicity offered
by this interface makes it worth having (and being the primary one). If
some caller really needs to tell the difference between the two, we can
add another function later.

Obviously they could use strmap_contains(), but that would mean two hash
lookups.

> +/*
> + * Same as strmap_init, but for those who want to control the memory management
> + * carefully instead of using the default of strdup_strings=1.
> + * (OCD = Obsessive Compulsive Disorder, a joke that those who use this function
> + * are obsessing over minor details.)
> + */
> +void strmap_ocd_init(struct strmap *map,
> +		     int strdup_strings);

I'm not personally bothered by this name, but I wonder if some people
may be (because they have or know somebody who actually has OCD).

Perhaps strmap_init_with_options() would be a better name? It likewise
would extend well if we want to add other non-default options later.

-Peff

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v2 06/10] strmap: add more utility functions
  2020-10-13  0:40   ` [PATCH v2 06/10] strmap: add more " Elijah Newren via GitGitGadget
@ 2020-10-30 14:23     ` Jeff King
  2020-10-30 16:43       ` Elijah Newren
  0 siblings, 1 reply; 144+ messages in thread
From: Jeff King @ 2020-10-30 14:23 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget; +Cc: git, Elijah Newren

On Tue, Oct 13, 2020 at 12:40:46AM +0000, Elijah Newren via GitGitGadget wrote:

> strmap_get_entry() is similar to strmap_get() except that instead of just
> returning the void* value that the string maps to, it returns the
> strmap_entry that contains both the string and the void* value (or
> NULL if the string isn't in the map).  This is helpful because it avoids
> multiple lookups, e.g. in some cases a caller would need to call:
>   * strmap_contains() to check that the map has an entry for the string
>   * strmap_get() to get the void* value
>   * <do some work to update the value>
>   * strmap_put() to update/overwrite the value

Oh, I guess I should have read ahead when responding to the last patch. :)

Yes, this function makes perfect sense to have (along with the simpler
alternatives for the callers that don't need this complexity).

>  strmap.c | 20 ++++++++++++++++++++
>  strmap.h | 38 ++++++++++++++++++++++++++++++++++++++

The implementation all looks pretty straight-forward.

> +void strmap_remove(struct strmap *map, const char *str, int free_util)
> +{
> +	struct strmap_entry entry, *ret;
> +	hashmap_entry_init(&entry.ent, strhash(str));
> +	entry.key = str;
> +	ret = hashmap_remove_entry(&map->map, &entry, ent, NULL);
> +	if (!ret)
> +		return;
> +	if (free_util)
> +		free(ret->value);
> +	if (map->strdup_strings)
> +		free((char*)ret->key);
> +	free(ret);
> +}

Another spot that would be simplified by using FLEXPTRs. :)

> +/*
> + * Return whether the strmap is empty.
> + */
> +static inline int strmap_empty(struct strmap *map)
> +{
> +	return hashmap_get_size(&map->map) == 0;
> +}

Maybe:

  return strmap_get_size(&map) == 0;

would be slightly simpler (and more importantly, show callers the
equivalence between the two).

> +/*
> + * iterate through @map using @iter, @var is a pointer to a type strmap_entry
> + */
> +#define strmap_for_each_entry(mystrmap, iter, var)	\
> +	for (var = hashmap_iter_first_entry_offset(&(mystrmap)->map, iter, \
> +						   OFFSETOF_VAR(var, ent)); \
> +		var; \
> +		var = hashmap_iter_next_entry_offset(iter, \
> +						     OFFSETOF_VAR(var, ent)))

Makes sense. This is like hashmap_for_each_entry, but we don't need
anyone to tell us the offset of "ent" within the struct.

I suspect we need the same "var = NULL" that hashmap recently got in
0ad621f61e (hashmap_for_each_entry(): workaround MSVC's runtime check
failure #3, 2020-09-30). Alternatively, I think you could drop
OFFSETOF_VAR completely in favor offsetof(struct strmap_entry, ent).

In fact, since we know the correct type for "var", we _could_ declare it
ourselves in a new block enclosing the loop. But that is probably making
the code too magic; people reading the code would say "huh? where is
entry declared?".

-Peff

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v2 07/10] strmap: enable faster clearing and reusing of strmaps
  2020-10-13  0:40   ` [PATCH v2 07/10] strmap: enable faster clearing and reusing of strmaps Elijah Newren via GitGitGadget
@ 2020-10-30 14:27     ` Jeff King
  0 siblings, 0 replies; 144+ messages in thread
From: Jeff King @ 2020-10-30 14:27 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget; +Cc: git, Elijah Newren

On Tue, Oct 13, 2020 at 12:40:47AM +0000, Elijah Newren via GitGitGadget wrote:

> From: Elijah Newren <newren@gmail.com>
> 
> When strmaps are used heavily, such as is done by my new merge-ort
> algorithm, and strmaps need to be cleared but then re-used (because of
> e.g. picking multiple commits to cherry-pick, or due to a recursive
> merge having several different merges while recursing), free-ing and
> reallocating map->table repeatedly can add up in time, especially since
> it will likely be reallocated to a much smaller size but the previous
> merge provides a good guide to the right size to use for the next merge.
> 
> Introduce strmap_partial_clear() to take advantage of this type of
> situation; it will act similar to strmap_clear() except that
> map->table's entries are zeroed instead of map->table being free'd.
> Making use of this function reduced the cost of reset_maps() by about
> 20% in mert-ort, and dropped the overall runtime of my rebase testcase
> by just under 2%.

Oh, these were the real numbers I was looking for earlier. :)

Of course it's a little confusing because reset_maps() doesn't exist yet
in the code base this is being applied on, but I can live with that.

> +/*
> + * Similar to strmap_clear() but leaves map->map->table allocated and
> + * pre-sized so that subsequent uses won't need as many rehashings.
> + */
> +void strmap_partial_clear(struct strmap *map, int free_values);

Oh good, you anticipated my free_values suggestion from earlier. But...

> +void strmap_partial_clear(struct strmap *map, int free_util)
> +{
> +	strmap_free_entries_(map, free_util);
> +	hashmap_partial_clear(&map->map);
> +}

...the implementation didn't catch up.

Other than that the patch looks obviously correct.

-Peff

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v2 08/10] strmap: add functions facilitating use as a string->int map
  2020-10-13  0:40   ` [PATCH v2 08/10] strmap: add functions facilitating use as a string->int map Elijah Newren via GitGitGadget
@ 2020-10-30 14:39     ` Jeff King
  2020-10-30 17:28       ` Elijah Newren
  0 siblings, 1 reply; 144+ messages in thread
From: Jeff King @ 2020-10-30 14:39 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget; +Cc: git, Elijah Newren

On Tue, Oct 13, 2020 at 12:40:48AM +0000, Elijah Newren via GitGitGadget wrote:

> Although strmap could be used as a string->int map, one either had to
> allocate an int for every entry and then deallocate later, or one had to
> do a bunch of casting between (void*) and (intptr_t).
> 
> Add some special functions that do the casting.  Also, rename put->set
> for such wrapper functions since 'put' implied there may be some
> deallocation needed if the string was already found in the map, which
> isn't the case when we're storing an int value directly in the void*
> slot instead of using the void* slot as a pointer to data.

I think this is worth doing. That kind of casting is an implementation
detail, and it's nice for callers not to have to see it.

You might want to mention that this _could_ be done as just accessors to
strmap, but using a separate struct provides type safety against
misusing pointers as integers or vice versa.

> A note on the name: if anyone has a better name suggestion than
> strintmap, I'm happy to take it.  It seems slightly unwieldy, but I have
> not been able to come up with a better name.

I still don't have a better suggestion on the name. Another convention
could be to name map types as "map_from_to". So "struct map_str_int".
But it's pretty ugly, and strmap would become "map_str_ptr" or
something. As ugly as "strintmap" is, I like it better.

> +void strintmap_incr(struct strintmap *map, const char *str, intptr_t amt)
> +{
> +	struct strmap_entry *entry = find_strmap_entry(&map->map, str);
> +	if (entry) {
> +		intptr_t *whence = (intptr_t*)&entry->value;
> +		*whence += amt;
> +	}
> +	else
> +		strintmap_set(map, str, amt);
> +}

Incrementing a missing entry auto-vivifies it at 0.  That makes perfect
sense, but might be worth noting above the function in the header file.

Though maybe it's a little weird since strintmap_get() takes a default
value. Why don't we use that here? I'd have to see how its used, but
would it make sense to set a default value when initializing the map,
rather than providing it on each call?

> +/*
> + * strintmap:
> + *    A map of string -> int, typecasting the void* of strmap to an int.

Are the size and signedness of an int flexible enough for all uses?

I doubt the standard makes any promises about the relationship between
intptr_t and int, but I'd be surprised if any modern platform has an
intptr_t that isn't at least as big as an int (on most 32-bit platforms
they'll be the same, and on 64-bit ones intptr_t is strictly bigger).

Would any callers care about using the full 32-bits, though? I.e., would
they prefer casting through uintptr_t to an "unsigned int"?

-Peff

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v2 09/10] strmap: add a strset sub-type
  2020-10-13  0:40   ` [PATCH v2 09/10] strmap: add a strset sub-type Elijah Newren via GitGitGadget
@ 2020-10-30 14:44     ` Jeff King
  2020-10-30 18:02       ` Elijah Newren
  0 siblings, 1 reply; 144+ messages in thread
From: Jeff King @ 2020-10-30 14:44 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget; +Cc: git, Elijah Newren

On Tue, Oct 13, 2020 at 12:40:49AM +0000, Elijah Newren via GitGitGadget wrote:

> From: Elijah Newren <newren@gmail.com>
> 
> Similar to adding strintmap for special-casing a string -> int mapping,
> add a strset type for cases where we really are only interested in using
> strmap for storing a set rather than a mapping.  In this case, we'll
> always just store NULL for the value but the different struct type makes
> it clearer than code comments how a variable is intended to be used.
> 
> The difference in usage also results in some differences in API: a few
> things that aren't necessary or meaningful are dropped (namely, the
> free_util argument to *_clear(), and the *_get() function), and
> strset_add() is chosen as the API instead of strset_put().

That all makes sense.

We're wasting 8 bytes of NULL pointer for each entry, but it's unlikely
to be all that important. If we later find a case where we think it
matters, we can always refactor the type not to depend on strmap.

I'd want a strset_check_and_add() to match what I used recently in
shortlog.h. Maybe strset_contains_and_add() would be a better name to
match the individual functions here. I dunno (it actually seems
clunkier).

-Peff

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v2 10/10] strmap: enable allocations to come from a mem_pool
  2020-10-13  0:40   ` [PATCH v2 10/10] strmap: enable allocations to come from a mem_pool Elijah Newren via GitGitGadget
@ 2020-10-30 14:56     ` Jeff King
  2020-10-30 19:31       ` Elijah Newren
  0 siblings, 1 reply; 144+ messages in thread
From: Jeff King @ 2020-10-30 14:56 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget; +Cc: git, Elijah Newren

On Tue, Oct 13, 2020 at 12:40:50AM +0000, Elijah Newren via GitGitGadget wrote:

> For heavy users of strmaps, allowing the keys and entries to be
> allocated from a memory pool can provide significant overhead savings.
> Add an option to strmap_ocd_init() to specify a memory pool.

So this one interacts badly with my FLEXPTR suggestion.

I guess it provides most of the benefit that FLEXPTR would, because
we're getting both the entries and the strings from the mempool. Which
really ends up being an almost identical memory layout, since the
mempool presumably just gives you the N bytes for the string right after
the last thing you allocated, which would be the struct.

The only downside is that if you don't want to use the mempool (e.g.,
because you might actually strmap_remove() things), you don't get the
advantage.

I think we could fall back to a FLEXPTR when there's no mempool (or even
when there is, though you'd be on your own to reimplement the
computation parts of FLEXPTR_ALLOC). I'm not sure how ugly it would end
up.

I haven't used our mem_pool before, but the code all looks quite
straightforward to me. I guess the caller is responsible for
de-allocating the mempool, which makes sense. It would be nice to see
real numbers on how much this helps, but again, you might not have the
commits in the right order to easily find out.

-Peff

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v2 03/10] hashmap: allow re-use after hashmap_free()
  2020-10-30 13:35     ` Jeff King
@ 2020-10-30 15:37       ` Elijah Newren
  2020-11-03 16:08         ` Jeff King
  0 siblings, 1 reply; 144+ messages in thread
From: Elijah Newren @ 2020-10-30 15:37 UTC (permalink / raw)
  To: Jeff King; +Cc: Elijah Newren via GitGitGadget, Git Mailing List

On Fri, Oct 30, 2020 at 6:35 AM Jeff King <peff@peff.net> wrote:
>
> On Tue, Oct 13, 2020 at 12:40:43AM +0000, Elijah Newren via GitGitGadget wrote:
>
> > Previously, once map->table had been freed, any calls to hashmap_put(),
> > hashmap_get(), or hashmap_remove() would cause a NULL pointer
> > dereference (since hashmap_free_() also zeros the memory; without that
> > zeroing, calling these functions would cause a use-after-free problem).
> >
> > Modify these functions to check for a NULL table and automatically
> > allocate as needed.
>
> Unsurprisingly, I like this direction. The code looks correct to me,
> though I think you could reduce duplication slightly by checking
> map->table in find_entry_ptr(). That covers both hashmap_get() and
> hashmap_remove(). But I'm happy either way.
>
> > I also thought about creating a HASHMAP_INIT macro to allow initializing
> > hashmaps on the stack without calling hashmap_init(), but virtually all
> > uses of hashmap specify a usecase-specific equals_function which defeats
> > the utility of such a macro.
>
> This part I disagree with. If we did:
>
>   #define HASHMAP_INIT(fn, data) = { .cmpfn = cmpfn, cmpfn_data = data }
>
> then many callers could avoid handling the lazy-init themselves. E.g.:

Ah, gotcha.  That makes sense to me.  Given that 43 out of 47 callers
of hashmap_init use cmpfn_data = NULL, should I shorten it to just one
parameter for the macro, and let the four special cases keep calling
hashmap_init() to specify a non-NULL cmpfn_data?

>
>  attr.c | 16 +++-------------
>  1 file changed, 3 insertions(+), 13 deletions(-)
>
> diff --git a/attr.c b/attr.c
> index a826b2ef1f..55a2783f1b 100644
> --- a/attr.c
> +++ b/attr.c
> @@ -57,7 +57,9 @@ static inline void hashmap_unlock(struct attr_hashmap *map)
>   * is a singleton object which is shared between threads.
>   * Access to this dictionary must be surrounded with a mutex.
>   */
> -static struct attr_hashmap g_attr_hashmap;
> +static struct attr_hashmap g_attr_hashmap = {
> +       HASHMAP_INIT(attr_hash_entry_cmp, NULL)
> +};
>
>  /* The container for objects stored in "struct attr_hashmap" */
>  struct attr_hash_entry {
> @@ -80,12 +82,6 @@ static int attr_hash_entry_cmp(const void *unused_cmp_data,
>         return (a->keylen != b->keylen) || strncmp(a->key, b->key, a->keylen);
>  }
>
> -/* Initialize an 'attr_hashmap' object */
> -static void attr_hashmap_init(struct attr_hashmap *map)
> -{
> -       hashmap_init(&map->map, attr_hash_entry_cmp, NULL, 0);
> -}
> -
>  /*
>   * Retrieve the 'value' stored in a hashmap given the provided 'key'.
>   * If there is no matching entry, return NULL.
> @@ -96,9 +92,6 @@ static void *attr_hashmap_get(struct attr_hashmap *map,
>         struct attr_hash_entry k;
>         struct attr_hash_entry *e;
>
> -       if (!map->map.tablesize)
> -               attr_hashmap_init(map);
> -
>         hashmap_entry_init(&k.ent, memhash(key, keylen));
>         k.key = key;
>         k.keylen = keylen;
> @@ -114,9 +107,6 @@ static void attr_hashmap_add(struct attr_hashmap *map,
>  {
>         struct attr_hash_entry *e;
>
> -       if (!map->map.tablesize)
> -               attr_hashmap_init(map);
> -
>         e = xmalloc(sizeof(struct attr_hash_entry));
>         hashmap_entry_init(&e->ent, memhash(key, keylen));
>         e->key = key;

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v2 04/10] hashmap: introduce a new hashmap_partial_clear()
  2020-10-30 13:41     ` Jeff King
@ 2020-10-30 16:03       ` Elijah Newren
  2020-11-03 16:10         ` Jeff King
  0 siblings, 1 reply; 144+ messages in thread
From: Elijah Newren @ 2020-10-30 16:03 UTC (permalink / raw)
  To: Jeff King; +Cc: Elijah Newren via GitGitGadget, Git Mailing List

On Fri, Oct 30, 2020 at 6:41 AM Jeff King <peff@peff.net> wrote:
>
> On Tue, Oct 13, 2020 at 12:40:44AM +0000, Elijah Newren via GitGitGadget wrote:
>
> > merge-ort is a heavy user of strmaps, which are built on hashmap.[ch].
> > reset_maps() in merge-ort was taking about 12% of overall runtime in my
> > testcase involving rebasing 35 patches of linux.git across a big rename.
> > reset_maps() was calling hashmap_free() followed by hashmap_init(),
> > meaning that not only was it freeing all the memory associated with each
> > of the strmaps just to immediately allocate a new array again, it was
> > allocating a new array that wasy likely smaller than needed (thus
>
> s/wasy/was/

Thanks; will fix.

> > resulting in later need to rehash things).  The ending size of the map
> > table on the previous commit was likely almost perfectly sized for the
> > next commit we wanted to pick, and not dropping and reallocating the
> > table immediately is a win.
> >
> > Add some new API to hashmap to clear a hashmap of entries without
> > freeing map->table (and instead only zeroing it out like alloc_table()
> > would do, along with zeroing the count of items in the table and the
> > shrink_at field).
>
> This seems like a reasonable optimization to make, and doesn't make the
> API significantly more complicated. I'd expect the allocation of actual
> entry objects to dwarf the table allocation, but I guess:
>
>   - you'll deal with the individual entries later using a mempool
>
>   - it's not just the allocation, but the re-insertion of the entries as
>     we grow
>
> It would be nice if we had some actual perf numbers to report here, so
> we could know exactly how much it was buying us. But I guess things are
> a bit out-of-order there. You want to do this series first and then
> build merge-ort on top as a user. We could introduce the basic data
> structure first, then merge-ort, and then start applying optimizations
> with real-world measurements. But I'm not sure it's worth the amount of
> time you'd have to spend to reorganize in that way.

Yeah, the perf benefits didn't really come until I added a
strmap_clear() based on this, so as you discovered I put perf numbers
in patch 7 of this series.  Should I add a mention of the later commit
message at this point in the series?

> >  hashmap.c | 39 +++++++++++++++++++++++++++------------
> >  hashmap.h | 13 ++++++++++++-
>
> The implementation itself looks correct to me. I already mentioned my
> thoughts on naming in patch 1.

I'll circle back to that when I comment on patch 1...

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v2 05/10] strmap: new utility functions
  2020-10-30 14:12     ` Jeff King
@ 2020-10-30 16:26       ` Elijah Newren
  0 siblings, 0 replies; 144+ messages in thread
From: Elijah Newren @ 2020-10-30 16:26 UTC (permalink / raw)
  To: Jeff King; +Cc: Elijah Newren via GitGitGadget, Git Mailing List

On Fri, Oct 30, 2020 at 7:12 AM Jeff King <peff@peff.net> wrote:
>
> On Tue, Oct 13, 2020 at 12:40:45AM +0000, Elijah Newren via GitGitGadget wrote:
>
> > Add strmap as a new struct and associated utility functions,
> > specifically for hashmaps that map strings to some value.  The API is
> > taken directly from Peff's proposal at
> > https://lore.kernel.org/git/20180906191203.GA26184@sigill.intra.peff.net/
>
> This looks overall sane to me. I mentioned elsewhere that we could be
> using FLEXPTR_ALLOC to save an extra allocation. I think it's easy and
> worth doing here, as the logic would be completely contained within
> strmap_put():
>
>   if (strdup_strings)
>         FLEXPTR_ALLOC_STR(entry, key, str);
>   else {
>         entry = xmalloc(sizeof(*entry));
>         entry->key = str;
>   }
>
> And free_entries() then doesn't even have to care about strdup_strings.

Yeah, as you noted in your review of 10/10 this idea wouldn't play
well with the later mem_pool changes.

> > A couple of items of note:
> >
> >   * Similar to string-list, I have a strdup_strings setting.  However,
> >     unlike string-list, strmap_init() does not take a parameter for this
> >     setting and instead automatically sets it to 1; callers who want to
> >     control this detail need to instead call strmap_ocd_init().
>
> That seems reasonable. It could just be a parameter, but I like that you
> push people in the direction of doing the simple and safe thing, rather
> than having them wonder whether they ought to set strdup_strings or not.

Well, in my first round of the series where I did make it a parameter
you balked pretty loudly.  ;-)

> >   * I do not have a STRMAP_INIT macro.  I could possibly add one, but
> >       #define STRMAP_INIT { { NULL, cmp_str_entry, NULL, 0, 0, 0, 0, 0 }, 1 }
> >     feels a bit unwieldy and possibly error-prone in terms of future
> >     expansion of the hashmap struct.  The fact that cmp_str_entry needs to
> >     be in there prevents us from passing all zeros for the hashmap, and makes
> >     me worry that STRMAP_INIT would just be more trouble than it is worth.
>
> You can actually omit everything after cmp_str_entry, and those fields
> would all get zero-initialized. But we also allow C99 designed
> initializers these days. Coupled with the HASHMAP_INIT() I mentioned in
> the earlier email, you'd have:
>
>   #define STRMAP_INIT { \
>                 .map = HASHMAP_INIT(cmp_strmap_entry, NULL), \
>                 .strdup_strings = 1, \
>           }
>
> which seems pretty maintainable.

Makes sense; will add.

> > +static int cmp_strmap_entry(const void *hashmap_cmp_fn_data,
> > +                         const struct hashmap_entry *entry1,
> > +                         const struct hashmap_entry *entry2,
> > +                         const void *keydata)
> > +{
> > +     const struct strmap_entry *e1, *e2;
> > +
> > +     e1 = container_of(entry1, const struct strmap_entry, ent);
> > +     e2 = container_of(entry2, const struct strmap_entry, ent);
> > +     return strcmp(e1->key, e2->key);
> > +}
>
> I expected to use keydata here, but it's pretty easy to make a fake
> strmap_entry because of the use of the "key" pointer. So that makes
> sense.
>
> > +static void strmap_free_entries_(struct strmap *map, int free_util)
>
> You use the term "value" for the mapped-to value in this iteration. So
> perhaps free_values here (and in other functions) would be a better
> name?

Oops, yes, definitely.

> > +     /*
> > +      * We need to iterate over the hashmap entries and free
> > +      * e->key and e->value ourselves; hashmap has no API to
> > +      * take care of that for us.  Since we're already iterating over
> > +      * the hashmap, though, might as well free e too and avoid the need
> > +      * to make some call into the hashmap API to do that.
> > +      */
> > +     hashmap_for_each_entry(&map->map, &iter, e, ent) {
> > +             if (free_util)
> > +                     free(e->value);
> > +             if (map->strdup_strings)
> > +                     free((char*)e->key);
> > +             free(e);
> > +     }
> > +}
>
> Yep, makes sense.
>
> > +void strmap_clear(struct strmap *map, int free_util)
> > +{
> > +     strmap_free_entries_(map, free_util);
> > +     hashmap_free(&map->map);
> > +}
>
> This made me wonder about a partial_clear(), but it looks like that
> comes later.
>
> > +void *strmap_put(struct strmap *map, const char *str, void *data)
> > +{
> > +     struct strmap_entry *entry = find_strmap_entry(map, str);
> > +     void *old = NULL;
> > +
> > +     if (entry) {
> > +             old = entry->value;
> > +             entry->value = data;
>
> Here's a weird hypothetical. If strdup_strings is not set and I do:
>
>   const char *one = xstrdup("foo");
>   const char *two = xstrdup("foo");
>
>   hashmap_put(map, one, x);
>   hashmap_put(map, two, y);
>
> it's clear that the value should be pointing to "y" afterwards (and you
> return "x" so the caller can free it or whatever, good).
>
> But which key should the entry be pointing to? The old one or the new
> one? I'm trying and failing to think of a case where it would matter.
> Certainly I could add a free() to the toy above where it would, but it
> feels like a real caller would have to have pretty convoluted memory
> lifetime semantics for it to make a difference.
>
> So I'm not really asking for a particular behavior, but just bringing it
> up in case you can think of something relevant.

I'll keep mulling it over, but I likewise can't currently think of a
case where it'd matter.

>
> > +     } else {
> > +             /*
> > +              * We won't modify entry->key so it really should be const.
> > +              */
> > +             const char *key = str;
>
> The "should be" here confused me. It _is_ const. I'd probably just
> delete the comment entirely, but perhaps:
>
>   /*
>    * We'll store a const pointer. For non-duplicated strings, they belong
>    * to the caller and we received them as const in the first place. For
>    * our duplicated ones, they do point to memory we own, but they're
>    * still conceptually constant within the lifetime of an entry.
>    */
>
> Though it might make more sense in the struct definition, not here.

Either I was (mistakenly) worried about "I'm going to allocate and
copy, but during the copy it isn't actually const", or this is a
leftover artifact from some of the other iterations I tried.  Anyway,
I think this comment isn't useful; I'll just strike it.

> > +void *strmap_get(struct strmap *map, const char *str)
> > +{
> > +     struct strmap_entry *entry = find_strmap_entry(map, str);
> > +     return entry ? entry->value : NULL;
> > +}
>
> Just noting that the caller can't tell the difference between "no such
> entry" and "the entry is storing NULL". I think the simplicity offered
> by this interface makes it worth having (and being the primary one). If
> some caller really needs to tell the difference between the two, we can
> add another function later.
>
> Obviously they could use strmap_contains(), but that would mean two hash
> lookups.

Yep, addressed later by strmap_get_entry() in another patch, as you
noticed in your later review.

> > +/*
> > + * Same as strmap_init, but for those who want to control the memory management
> > + * carefully instead of using the default of strdup_strings=1.
> > + * (OCD = Obsessive Compulsive Disorder, a joke that those who use this function
> > + * are obsessing over minor details.)
> > + */
> > +void strmap_ocd_init(struct strmap *map,
> > +                  int strdup_strings);
>
> I'm not personally bothered by this name, but I wonder if some people
> may be (because they have or know somebody who actually has OCD).
>
> Perhaps strmap_init_with_options() would be a better name? It likewise
> would extend well if we want to add other non-default options later.

Doh!  That's going to push a bunch of lines past 80 characters.  Sigh...

It's probably a better name though; I'll change it.

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v2 06/10] strmap: add more utility functions
  2020-10-30 14:23     ` Jeff King
@ 2020-10-30 16:43       ` Elijah Newren
  2020-11-03 16:12         ` Jeff King
  0 siblings, 1 reply; 144+ messages in thread
From: Elijah Newren @ 2020-10-30 16:43 UTC (permalink / raw)
  To: Jeff King; +Cc: Elijah Newren via GitGitGadget, Git Mailing List

On Fri, Oct 30, 2020 at 7:23 AM Jeff King <peff@peff.net> wrote:
>
> On Tue, Oct 13, 2020 at 12:40:46AM +0000, Elijah Newren via GitGitGadget wrote:
>
> > strmap_get_entry() is similar to strmap_get() except that instead of just
> > returning the void* value that the string maps to, it returns the
> > strmap_entry that contains both the string and the void* value (or
> > NULL if the string isn't in the map).  This is helpful because it avoids
> > multiple lookups, e.g. in some cases a caller would need to call:
> >   * strmap_contains() to check that the map has an entry for the string
> >   * strmap_get() to get the void* value
> >   * <do some work to update the value>
> >   * strmap_put() to update/overwrite the value
>
> Oh, I guess I should have read ahead when responding to the last patch. :)
>
> Yes, this function makes perfect sense to have (along with the simpler
> alternatives for the callers that don't need this complexity).
>
> >  strmap.c | 20 ++++++++++++++++++++
> >  strmap.h | 38 ++++++++++++++++++++++++++++++++++++++
>
> The implementation all looks pretty straight-forward.
>
> > +void strmap_remove(struct strmap *map, const char *str, int free_util)
> > +{
> > +     struct strmap_entry entry, *ret;
> > +     hashmap_entry_init(&entry.ent, strhash(str));
> > +     entry.key = str;
> > +     ret = hashmap_remove_entry(&map->map, &entry, ent, NULL);
> > +     if (!ret)
> > +             return;
> > +     if (free_util)
> > +             free(ret->value);
> > +     if (map->strdup_strings)
> > +             free((char*)ret->key);
> > +     free(ret);
> > +}
>
> Another spot that would be simplified by using FLEXPTRs. :)
>
> > +/*
> > + * Return whether the strmap is empty.
> > + */
> > +static inline int strmap_empty(struct strmap *map)
> > +{
> > +     return hashmap_get_size(&map->map) == 0;
> > +}
>
> Maybe:
>
>   return strmap_get_size(&map) == 0;
>
> would be slightly simpler (and more importantly, show callers the
> equivalence between the two).

Makes sense; will change it.

> > +/*
> > + * iterate through @map using @iter, @var is a pointer to a type strmap_entry
> > + */
> > +#define strmap_for_each_entry(mystrmap, iter, var)   \
> > +     for (var = hashmap_iter_first_entry_offset(&(mystrmap)->map, iter, \
> > +                                                OFFSETOF_VAR(var, ent)); \
> > +             var; \
> > +             var = hashmap_iter_next_entry_offset(iter, \
> > +                                                  OFFSETOF_VAR(var, ent)))
>
> Makes sense. This is like hashmap_for_each_entry, but we don't need
> anyone to tell us the offset of "ent" within the struct.
>
> I suspect we need the same "var = NULL" that hashmap recently got in
> 0ad621f61e (hashmap_for_each_entry(): workaround MSVC's runtime check
> failure #3, 2020-09-30). Alternatively, I think you could drop
> OFFSETOF_VAR completely in favor offsetof(struct strmap_entry, ent).
>
> In fact, since we know the correct type for "var", we _could_ declare it
> ourselves in a new block enclosing the loop. But that is probably making
> the code too magic; people reading the code would say "huh? where is
> entry declared?".

Actually, since we know ent is the first entry in strmap, the offset
is always 0.  So can't we just avoid OFFSETOF_VAR() and offsetof()
entirely, by just using hashmap_iter_first() and hashmap_iter_next()?
I'm going to try that.

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v2 08/10] strmap: add functions facilitating use as a string->int map
  2020-10-30 14:39     ` Jeff King
@ 2020-10-30 17:28       ` Elijah Newren
  2020-11-03 16:20         ` Jeff King
  0 siblings, 1 reply; 144+ messages in thread
From: Elijah Newren @ 2020-10-30 17:28 UTC (permalink / raw)
  To: Jeff King; +Cc: Elijah Newren via GitGitGadget, Git Mailing List

On Fri, Oct 30, 2020 at 7:39 AM Jeff King <peff@peff.net> wrote:
>
> On Tue, Oct 13, 2020 at 12:40:48AM +0000, Elijah Newren via GitGitGadget wrote:
>
> > Although strmap could be used as a string->int map, one either had to
> > allocate an int for every entry and then deallocate later, or one had to
> > do a bunch of casting between (void*) and (intptr_t).
> >
> > Add some special functions that do the casting.  Also, rename put->set
> > for such wrapper functions since 'put' implied there may be some
> > deallocation needed if the string was already found in the map, which
> > isn't the case when we're storing an int value directly in the void*
> > slot instead of using the void* slot as a pointer to data.
>
> I think this is worth doing. That kind of casting is an implementation
> detail, and it's nice for callers not to have to see it.
>
> You might want to mention that this _could_ be done as just accessors to
> strmap, but using a separate struct provides type safety against
> misusing pointers as integers or vice versa.

If I just did it as accessors, it makes it harder for myself and
others to remember what my huge piles of strmaps in merge-ort do; I
found that it became easier to follow the code and remember what
things were doing when some were marked as strmap, some as strintmap,
and some as strset.

> > A note on the name: if anyone has a better name suggestion than
> > strintmap, I'm happy to take it.  It seems slightly unwieldy, but I have
> > not been able to come up with a better name.
>
> I still don't have a better suggestion on the name. Another convention
> could be to name map types as "map_from_to". So "struct map_str_int".
> But it's pretty ugly, and strmap would become "map_str_ptr" or
> something. As ugly as "strintmap" is, I like it better.
>
> > +void strintmap_incr(struct strintmap *map, const char *str, intptr_t amt)
> > +{
> > +     struct strmap_entry *entry = find_strmap_entry(&map->map, str);
> > +     if (entry) {
> > +             intptr_t *whence = (intptr_t*)&entry->value;
> > +             *whence += amt;
> > +     }
> > +     else
> > +             strintmap_set(map, str, amt);
> > +}
>
> Incrementing a missing entry auto-vivifies it at 0.  That makes perfect
> sense, but might be worth noting above the function in the header file.
>
> Though maybe it's a little weird since strintmap_get() takes a default
> value. Why don't we use that here? I'd have to see how its used, but
> would it make sense to set a default value when initializing the map,
> rather than providing it on each call?

That probably makes sense.  It turns out there is one strintmap for
which I call strintmap_get() in two different places with different
default values, but I think I can fix that up (one of them really
needed -1 as the default, while the other callsite just needed the
default to not accidentally match a specific enum value and 0 was
convenient).

>
> > +/*
> > + * strintmap:
> > + *    A map of string -> int, typecasting the void* of strmap to an int.
>
> Are the size and signedness of an int flexible enough for all uses?

If some users want signed values and others want unsigned, I'm not
sure how we can satisfy both.  Maybe make a struintmap?

Perhaps that could be added later if uses come up for it?  Some of my
uses need int, the rest of them wouldn't care about int vs unsigned.

> I doubt the standard makes any promises about the relationship between
> intptr_t and int, but I'd be surprised if any modern platform has an
> intptr_t that isn't at least as big as an int (on most 32-bit platforms
> they'll be the same, and on 64-bit ones intptr_t is strictly bigger).
>
> Would any callers care about using the full 32-bits, though? I.e., would
> they prefer casting through uintptr_t to an "unsigned int"?

I don't care about the full 32 bits (I'll probably use less than 16),
but I absolutely wanted it signed for my uses.  I think it makes sense
to be signed when using it for an index within an array (-1 for "not
found" makes sense; using arbitrary large numbers seems really ugly
(and perhaps buggy) to me).  It also makes sense to me to use -1 as an
invalid enum value, though I guess I could technically specify an
additional "INVALID_VALUE" within the enum and use it as the default.

If someone does care about the full range of bits up to 64 on relevant
platforms, I guess I should make it strintptr_t_map.  But besides the
egregiously ugly name, one advantage of int over intptr_t (or unsigned
over uintptr_t) is that you can use it in a printf easily:
   printf("Size: %d\n", strintmap_get(&foo, 0));
whereas if it strintmap_get() returns an intptr_t, then it's a royal
mess to attempt to portably use it without adding additional manual
casts.  Maybe I was just missing something obvious, but I couldn't
figure out the %d, %ld, %lld, PRIdMAX, etc. choices and get the
statement to compile on all platforms, so I'd always just cast to int
or unsigned at the time of calling printf.

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v2 09/10] strmap: add a strset sub-type
  2020-10-30 14:44     ` Jeff King
@ 2020-10-30 18:02       ` Elijah Newren
  0 siblings, 0 replies; 144+ messages in thread
From: Elijah Newren @ 2020-10-30 18:02 UTC (permalink / raw)
  To: Jeff King; +Cc: Elijah Newren via GitGitGadget, Git Mailing List

On Fri, Oct 30, 2020 at 7:44 AM Jeff King <peff@peff.net> wrote:
>
> On Tue, Oct 13, 2020 at 12:40:49AM +0000, Elijah Newren via GitGitGadget wrote:
>
> > From: Elijah Newren <newren@gmail.com>
> >
> > Similar to adding strintmap for special-casing a string -> int mapping,
> > add a strset type for cases where we really are only interested in using
> > strmap for storing a set rather than a mapping.  In this case, we'll
> > always just store NULL for the value but the different struct type makes
> > it clearer than code comments how a variable is intended to be used.
> >
> > The difference in usage also results in some differences in API: a few
> > things that aren't necessary or meaningful are dropped (namely, the
> > free_util argument to *_clear(), and the *_get() function), and
> > strset_add() is chosen as the API instead of strset_put().
>
> That all makes sense.
>
> We're wasting 8 bytes of NULL pointer for each entry, but it's unlikely
> to be all that important. If we later find a case where we think it
> matters, we can always refactor the type not to depend on strmap.
>
> I'd want a strset_check_and_add() to match what I used recently in
> shortlog.h. Maybe strset_contains_and_add() would be a better name to
> match the individual functions here. I dunno (it actually seems
> clunkier).

Yeah, I'll just go with strset_check_and_add().  :-)

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v2 10/10] strmap: enable allocations to come from a mem_pool
  2020-10-30 14:56     ` Jeff King
@ 2020-10-30 19:31       ` Elijah Newren
  2020-11-03 16:24         ` Jeff King
  0 siblings, 1 reply; 144+ messages in thread
From: Elijah Newren @ 2020-10-30 19:31 UTC (permalink / raw)
  To: Jeff King; +Cc: Elijah Newren via GitGitGadget, Git Mailing List

On Fri, Oct 30, 2020 at 7:56 AM Jeff King <peff@peff.net> wrote:
>
> On Tue, Oct 13, 2020 at 12:40:50AM +0000, Elijah Newren via GitGitGadget wrote:
>
> > For heavy users of strmaps, allowing the keys and entries to be
> > allocated from a memory pool can provide significant overhead savings.
> > Add an option to strmap_ocd_init() to specify a memory pool.
>
> So this one interacts badly with my FLEXPTR suggestion.
>
> I guess it provides most of the benefit that FLEXPTR would, because
> we're getting both the entries and the strings from the mempool. Which
> really ends up being an almost identical memory layout, since the
> mempool presumably just gives you the N bytes for the string right after
> the last thing you allocated, which would be the struct.
>
> The only downside is that if you don't want to use the mempool (e.g.,
> because you might actually strmap_remove() things), you don't get the
> advantage.
>
> I think we could fall back to a FLEXPTR when there's no mempool (or even
> when there is, though you'd be on your own to reimplement the
> computation parts of FLEXPTR_ALLOC). I'm not sure how ugly it would end
> up.

Yeah, we'd need a mempool-specific reimplementation of FLEXPTR_ALLOC
with the mempool, and just avoid using it at all whenever
strdup_strings was 0.  Seems slightly ugly, but maybe it wouldn't be
too bad.  I could look into it.

> I haven't used our mem_pool before, but the code all looks quite
> straightforward to me. I guess the caller is responsible for
> de-allocating the mempool, which makes sense. It would be nice to see
> real numbers on how much this helps, but again, you might not have the
> commits in the right order to easily find out.

At the time I implemented it, I did grab some numbers.  It varied
quite a bit between different cases, since a lot of my strmaps are for
tracking when special cases arise and we can implement various
optimizations.  Naturally, a usecase which involves heavier use of
strmaps will mean greater benefits from using a mempool.  Also, if I
had implemented it later, after one rename-related optimization I
hadn't yet discovered at the time, then it would have shown a larger
relative reduction in overall execution time.  Anyway, at the time I
put the mempool into strmaps and made use of it in relevant places,
one of my rebase testcases saw an almost 5% reduction in overall
execution time.  I'm sure it would have been over 5% if I had
reordered it to come after my final rename optimization.

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v2 01/10] hashmap: add usage documentation explaining hashmap_free[_entries]()
  2020-10-30 12:50     ` Jeff King
@ 2020-10-30 19:55       ` Elijah Newren
  2020-11-03 16:26         ` Jeff King
  0 siblings, 1 reply; 144+ messages in thread
From: Elijah Newren @ 2020-10-30 19:55 UTC (permalink / raw)
  To: Jeff King; +Cc: Elijah Newren via GitGitGadget, Git Mailing List

On Fri, Oct 30, 2020 at 5:51 AM Jeff King <peff@peff.net> wrote:
>
> On Tue, Oct 13, 2020 at 12:40:41AM +0000, Elijah Newren via GitGitGadget wrote:
>
> > The existence of hashmap_free() and hashmap_free_entries() confused me,
> > and the docs weren't clear enough.  We are dealing with a map table,
> > entries in that table, and possibly also things each of those entries
> > point to.  I had to consult other source code examples and the
> > implementation.  Add a brief note to clarify the differences.  This will
> > become even more important once we introduce a new
> > hashmap_partial_clear() function which will add the question of whether
> > the table itself has been freed.
>
> This is a definite improvement, and I don't see any inaccuracies in the
> descriptions. I do think some re-naming would help in the long run,
> though. E.g.:
>
>   - hashmap_clear() - remove all entries and de-allocate any
>     hashmap-specific data, but be ready for reuse
>
>   - hashmap_clear_and_free() - ditto, but free the entries themselves
>
>   - hashmap_partial_clear() - remove all entries but don't deallocate
>     table
>
>   - hashmap_partial_clear_and_free() - ditto, but free the entries
>
> So always call it "clear", but allow options in two dimensions (partial
> or not, free entries or not).
>
> Those could be parameters to a single function, but I think it gets a
> little ugly because "and_free" requires passing in the type of the
> entries in order to find the pointers.
>
> The "not" cases are implied in the names, but hashmap_clear_full() would
> be OK with me, too.
>
> But I think in the current scheme that "free" is somewhat overloaded,
> and if we end with a "clear" and a "free" that seems confusing to me.

Hmm...there are quite a few calls to hashmap_free() and
hashmap_free_entries() throughout the codebase.  I'm wondering if I
should make switching these over to your new naming suggestions a
separate follow-on series from this one, so that if there are any
conflicts with other series it doesn't need to hold these first 10
patches up.

If I do that, I could also add a patch to convert several callers of
hashmap_init() to use the new HASHMAP_INIT() macro, and another patch
to convert shortlog to using my strset instead of its own.

^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v3 00/13] Add struct strmap and associated utility functions
  2020-10-13  0:40 ` [PATCH v2 00/10] " Elijah Newren via GitGitGadget
                     ` (9 preceding siblings ...)
  2020-10-13  0:40   ` [PATCH v2 10/10] strmap: enable allocations to come from a mem_pool Elijah Newren via GitGitGadget
@ 2020-11-02 18:55   ` Elijah Newren via GitGitGadget
  2020-11-02 18:55     ` [PATCH v3 01/13] hashmap: add usage documentation explaining hashmap_free[_entries]() Elijah Newren via GitGitGadget
                       ` (14 more replies)
  10 siblings, 15 replies; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-02 18:55 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren

Here I introduce a new strmap type (and strintmap and strset), which my new
merge backed, merge-ort, uses heavily. (I also made significant use of it in
my changes to diffcore-rename). This strmap type was based on Peff's
proposal from a couple years ago[1], but has additions that I made as I used
it, and a number of additions/changes suggested by Peff in his reviews. I
also start the series off with some changes to hashmap, based on Peff's
feedback on v1 & v2.

NOTE: My "fundamentals of merge-ort implementation"[2] that depends on this
series needs to be updated due to these changes. I'll send a reroll for it
soon.

Changes since v2 (almost all of which were suggestions from Peff):

 * Added HASHMAP_INIT() and STR*_INIT() macros (and added a patch converting
   some existing callsites to use them)
 * Introduced the improved hashmap deallocation function names that Peff
   suggested and updated the codebase (currently merges cleanly with seen,
   though there's always a risk someone else is introducing a new one, but
   it's at least clean right now)
 * Renamed free_util -> free_values, everywhere this time
 * Renamed strmap_ocd_init() -> strmap_init_with_options(). Similar for the
   other str* subtypes.
 * Implemented strmap_empty() on top of strmap_get_size() instead of
   hashmap_get_size()
 * Avoided the OFFSETOF_VAR initialization-on-windows concerns by just not
   using the macro; for strmap_entry, the offset is always 0
 * Stored the default_value for a strintmap in the strintmap rather than
   requiring it at every call to strintmap_get(). Updated strintmap_incr()
   to make use of it as well.
 * Add a strset_check_and_add() function to the API
 * Added an extra patch at the end to take advantage of FLEXPTR_ALLOC_STR in
   the default case to avoid an extra allocation and free.
 * Tweaked some commit messages, fixed a few more argument-alignment issues,
   removed a bad comment

Things that I'm still unsure about:

 * strintmap_init() takes a default_value parameter, as suggested by Peff.
   But this makes the function name strintmap_init_with_options() weird,
   because strintmap_init() already takes one option, so it seems like the
   name needs to replace "options" with "more_options". But that's kinda
   ugly too. I'm guessing strintmap_init_with_options() is fine as-is, but
   I'm wondering if anyone else thinks it looks weird and if so if there is
   anything I should do about it.

Things Peff mentioned on v2 that I did NOT do:

 * Peff brought up some questions about mapping strintmap to an int rather
   than an unsigned or intptr_t. I discussed my rationale in the thread

Things Peff mentioned on v1 that are still not included and which Peff
didn't comment on for v2, but which may still be worth mentioning again:

 * Peff brought up the idea of having a free_values member instead of having
   a free_values parameter to strmap_clear(). That'd just mean moving the
   parameter from strmap_clear() to strmap_init() and would be easy to do,
   but he sounded like he was just throwing it out as an idea and I didn't
   have a strong opinion, so I left it as-is. If others have
   opinions/preferences, changing it is easy right now.
 * Peff early on wanted the strmap_entry type to have a char key[FLEX_ALLOC]
   instead of having a (const) char *key. I spent a couple more days on this
   despite him not mentioning it while reviewing v2, and finally got it
   working this time and running valgrind-free. Note that such a change
   means always copying the key instead of allowing it as an option. After
   implementing it, I timed it and it slowed down my important testcase by
   just over 6%. So I chucked it. I think the FLEXPTR_ALLOC_STR usage in
   combination with defaulting to strdup_strings=1 gives us most the
   benefits Peff wanted, while still allowing merge-ort to reuse strings
   when it's important.

[1] 
https://lore.kernel.org/git/20180906191203.GA26184@sigill.intra.peff.net/[2] 
https://lore.kernel.org/git/CABPp-BHKGkx04neULtYUyfiU+z-X7_rxQqriSEjxZjU1oXokOA@mail.gmail.com/T/#t
[3] 
https://lore.kernel.org/git/CABPp-BFyqTthyBmp5yt+iUniwTi+=y2QcBcmNnnCy=zvyi3Rbw@mail.gmail.com/

Elijah Newren (13):
  hashmap: add usage documentation explaining hashmap_free[_entries]()
  hashmap: adjust spacing to fix argument alignment
  hashmap: allow re-use after hashmap_free()
  hashmap: introduce a new hashmap_partial_clear()
  hashmap: provide deallocation function names
  strmap: new utility functions
  strmap: add more utility functions
  strmap: enable faster clearing and reusing of strmaps
  strmap: add functions facilitating use as a string->int map
  strmap: add a strset sub-type
  strmap: enable allocations to come from a mem_pool
  strmap: take advantage of FLEXPTR_ALLOC_STR when relevant
  Use new HASHMAP_INIT macro to simplify hashmap initialization

 Makefile                |   1 +
 add-interactive.c       |   2 +-
 attr.c                  |  26 ++--
 blame.c                 |   2 +-
 bloom.c                 |   5 +-
 builtin/difftool.c      |   9 +-
 builtin/fetch.c         |   6 +-
 builtin/shortlog.c      |   2 +-
 config.c                |   2 +-
 diff.c                  |   4 +-
 diffcore-rename.c       |   2 +-
 dir.c                   |   8 +-
 hashmap.c               |  74 +++++++----
 hashmap.h               |  91 ++++++++++---
 merge-recursive.c       |   6 +-
 name-hash.c             |   4 +-
 object.c                |   2 +-
 oidmap.c                |   2 +-
 patch-ids.c             |   2 +-
 range-diff.c            |   6 +-
 ref-filter.c            |   2 +-
 revision.c              |  11 +-
 sequencer.c             |   4 +-
 strmap.c                | 158 +++++++++++++++++++++++
 strmap.h                | 280 ++++++++++++++++++++++++++++++++++++++++
 submodule-config.c      |   4 +-
 t/helper/test-hashmap.c |   9 +-
 27 files changed, 610 insertions(+), 114 deletions(-)
 create mode 100644 strmap.c
 create mode 100644 strmap.h


base-commit: d4a392452e292ff924e79ec8458611c0f679d6d4
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-835%2Fnewren%2Fstrmap-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-835/newren/strmap-v3
Pull-Request: https://github.com/git/git/pull/835

Range-diff vs v2:

  1:  af6b6fcb46 =  1:  af6b6fcb46 hashmap: add usage documentation explaining hashmap_free[_entries]()
  2:  75f17619e9 !  2:  591161fd78 hashmap: adjust spacing to fix argument alignment
     @@ hashmap.c: struct hashmap_entry *hashmap_remove(struct hashmap *map,
       {
       	struct hashmap_entry *old = hashmap_remove(map, entry, NULL);
       	hashmap_add(map, entry);
     +
     + ## hashmap.h ##
     +@@ hashmap.h: struct hashmap {
     +  * prevent expensive resizing. If 0, the table is dynamically resized.
     +  */
     + void hashmap_init(struct hashmap *map,
     +-			 hashmap_cmp_fn equals_function,
     +-			 const void *equals_function_data,
     +-			 size_t initial_size);
     ++		  hashmap_cmp_fn equals_function,
     ++		  const void *equals_function_data,
     ++		  size_t initial_size);
     + 
     + /* internal function for freeing hashmap */
     + void hashmap_free_(struct hashmap *map, ssize_t offset);
     +@@ hashmap.h: void hashmap_free_(struct hashmap *map, ssize_t offset);
     +  * and if it is on stack, you can just let it go out of scope).
     +  */
     + static inline void hashmap_entry_init(struct hashmap_entry *e,
     +-					unsigned int hash)
     ++				      unsigned int hash)
     + {
     + 	e->hash = hash;
     + 	e->next = NULL;
     +@@ hashmap.h: static inline unsigned int hashmap_get_size(struct hashmap *map)
     +  * to `hashmap_cmp_fn` to decide whether the entry matches the key.
     +  */
     + struct hashmap_entry *hashmap_get(const struct hashmap *map,
     +-				const struct hashmap_entry *key,
     +-				const void *keydata);
     ++				  const struct hashmap_entry *key,
     ++				  const void *keydata);
     + 
     + /*
     +  * Returns the hashmap entry for the specified hash code and key data,
     +@@ hashmap.h: static inline struct hashmap_entry *hashmap_get_from_hash(
     +  * call to `hashmap_get` or `hashmap_get_next`.
     +  */
     + struct hashmap_entry *hashmap_get_next(const struct hashmap *map,
     +-			const struct hashmap_entry *entry);
     ++				       const struct hashmap_entry *entry);
     + 
     + /*
     +  * Adds a hashmap entry. This allows to add duplicate entries (i.e.
     +@@ hashmap.h: void hashmap_add(struct hashmap *map, struct hashmap_entry *entry);
     +  * Returns the replaced entry, or NULL if not found (i.e. the entry was added).
     +  */
     + struct hashmap_entry *hashmap_put(struct hashmap *map,
     +-				struct hashmap_entry *entry);
     ++				  struct hashmap_entry *entry);
     + 
     + /*
     +  * Adds or replaces a hashmap entry contained within @keyvar,
     +@@ hashmap.h: struct hashmap_entry *hashmap_put(struct hashmap *map,
     +  * Argument explanation is the same as in `hashmap_get`.
     +  */
     + struct hashmap_entry *hashmap_remove(struct hashmap *map,
     +-					const struct hashmap_entry *key,
     +-					const void *keydata);
     ++				     const struct hashmap_entry *key,
     ++				     const void *keydata);
     + 
     + /*
     +  * Removes a hashmap entry contained within @keyvar,
     +@@ hashmap.h: struct hashmap_entry *hashmap_iter_next(struct hashmap_iter *iter);
     + 
     + /* Initializes the iterator and returns the first entry, if any. */
     + static inline struct hashmap_entry *hashmap_iter_first(struct hashmap *map,
     +-		struct hashmap_iter *iter)
     ++						       struct hashmap_iter *iter)
     + {
     + 	hashmap_iter_init(map, iter);
     + 	return hashmap_iter_next(iter);
  3:  a686d0758a !  3:  f2718d036d hashmap: allow re-use after hashmap_free()
     @@ Commit message
          Modify these functions to check for a NULL table and automatically
          allocate as needed.
      
     -    I also thought about creating a HASHMAP_INIT macro to allow initializing
     -    hashmaps on the stack without calling hashmap_init(), but virtually all
     -    uses of hashmap specify a usecase-specific equals_function which defeats
     -    the utility of such a macro.
     +    Also add a HASHMAP_INIT(fn, data) macro for initializing hashmaps on the
     +    stack without calling hashmap_init().
      
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
     @@ hashmap.c: struct hashmap_entry *hashmap_remove(struct hashmap *map,
       	if (!*e)
       		return NULL;
       
     +
     + ## hashmap.h ##
     +@@ hashmap.h: struct hashmap {
     + 
     + /* hashmap functions */
     + 
     ++#define HASHMAP_INIT(fn, data) { .cmpfn = fn, .cmpfn_data = data, \
     ++				 .do_count_items = 1 }
     ++
     + /*
     +  * Initializes a hashmap structure.
     +  *
  4:  061ab45a9b !  4:  61f1da3c51 hashmap: introduce a new hashmap_partial_clear()
     @@ Commit message
          hashmap: introduce a new hashmap_partial_clear()
      
          merge-ort is a heavy user of strmaps, which are built on hashmap.[ch].
     -    reset_maps() in merge-ort was taking about 12% of overall runtime in my
     -    testcase involving rebasing 35 patches of linux.git across a big rename.
     -    reset_maps() was calling hashmap_free() followed by hashmap_init(),
     -    meaning that not only was it freeing all the memory associated with each
     -    of the strmaps just to immediately allocate a new array again, it was
     -    allocating a new array that wasy likely smaller than needed (thus
     -    resulting in later need to rehash things).  The ending size of the map
     -    table on the previous commit was likely almost perfectly sized for the
     -    next commit we wanted to pick, and not dropping and reallocating the
     -    table immediately is a win.
     +    clear_or_reinit_internal_opts() in merge-ort was taking about 12% of
     +    overall runtime in my testcase involving rebasing 35 patches of
     +    linux.git across a big rename.  clear_or_reinit_internal_opts() was
     +    calling hashmap_free() followed by hashmap_init(), meaning that not only
     +    was it freeing all the memory associated with each of the strmaps just
     +    to immediately allocate a new array again, it was allocating a new array
     +    that was likely smaller than needed (thus resulting in later need to
     +    rehash things).  The ending size of the map table on the previous commit
     +    was likely almost perfectly sized for the next commit we wanted to pick,
     +    and not dropping and reallocating the table immediately is a win.
      
          Add some new API to hashmap to clear a hashmap of entries without
          freeing map->table (and instead only zeroing it out like alloc_table()
     @@ hashmap.c: void hashmap_init(struct hashmap *map, hashmap_cmp_fn equals_function
      
       ## hashmap.h ##
      @@ hashmap.h: void hashmap_init(struct hashmap *map,
     - 			 const void *equals_function_data,
     - 			 size_t initial_size);
     + 		  const void *equals_function_data,
     + 		  size_t initial_size);
       
      -/* internal function for freeing hashmap */
      +/* internal functions for clearing or freeing hashmap */
  -:  ---------- >  5:  861e8d65ae hashmap: provide deallocation function names
  5:  5c7507f55b !  6:  448d3b219f strmap: new utility functions
     @@ Commit message
          taken directly from Peff's proposal at
          https://lore.kernel.org/git/20180906191203.GA26184@sigill.intra.peff.net/
      
     -    A couple of items of note:
     -
     -      * Similar to string-list, I have a strdup_strings setting.  However,
     -        unlike string-list, strmap_init() does not take a parameter for this
     -        setting and instead automatically sets it to 1; callers who want to
     -        control this detail need to instead call strmap_ocd_init().
     -
     -      * I do not have a STRMAP_INIT macro.  I could possibly add one, but
     -          #define STRMAP_INIT { { NULL, cmp_str_entry, NULL, 0, 0, 0, 0, 0 }, 1 }
     -        feels a bit unwieldy and possibly error-prone in terms of future
     -        expansion of the hashmap struct.  The fact that cmp_str_entry needs to
     -        be in there prevents us from passing all zeros for the hashmap, and makes
     -        me worry that STRMAP_INIT would just be more trouble than it is worth.
     +    Note that similar string-list, I have a strdup_strings setting.
     +    However, unlike string-list, strmap_init() does not take a parameter for
     +    this setting and instead automatically sets it to 1; callers who want to
     +    control this detail need to instead call strmap_init_with_options().
     +    (Future patches will add additional parameters to
     +    strmap_init_with_options()).
      
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
     @@ strmap.c (new)
      +#include "git-compat-util.h"
      +#include "strmap.h"
      +
     -+static int cmp_strmap_entry(const void *hashmap_cmp_fn_data,
     -+			    const struct hashmap_entry *entry1,
     -+			    const struct hashmap_entry *entry2,
     -+			    const void *keydata)
     ++int cmp_strmap_entry(const void *hashmap_cmp_fn_data,
     ++		     const struct hashmap_entry *entry1,
     ++		     const struct hashmap_entry *entry2,
     ++		     const void *keydata)
      +{
      +	const struct strmap_entry *e1, *e2;
      +
     @@ strmap.c (new)
      +
      +void strmap_init(struct strmap *map)
      +{
     -+	strmap_ocd_init(map, 1);
     ++	strmap_init_with_options(map, 1);
      +}
      +
     -+void strmap_ocd_init(struct strmap *map,
     -+		     int strdup_strings)
     ++void strmap_init_with_options(struct strmap *map,
     ++			      int strdup_strings)
      +{
      +	hashmap_init(&map->map, cmp_strmap_entry, NULL, 0);
      +	map->strdup_strings = strdup_strings;
      +}
      +
     -+static void strmap_free_entries_(struct strmap *map, int free_util)
     ++static void strmap_free_entries_(struct strmap *map, int free_values)
      +{
      +	struct hashmap_iter iter;
      +	struct strmap_entry *e;
     @@ strmap.c (new)
      +	 * to make some call into the hashmap API to do that.
      +	 */
      +	hashmap_for_each_entry(&map->map, &iter, e, ent) {
     -+		if (free_util)
     ++		if (free_values)
      +			free(e->value);
      +		if (map->strdup_strings)
      +			free((char*)e->key);
     @@ strmap.c (new)
      +	}
      +}
      +
     -+void strmap_clear(struct strmap *map, int free_util)
     ++void strmap_clear(struct strmap *map, int free_values)
      +{
     -+	strmap_free_entries_(map, free_util);
     -+	hashmap_free(&map->map);
     ++	strmap_free_entries_(map, free_values);
     ++	hashmap_clear(&map->map);
      +}
      +
      +void *strmap_put(struct strmap *map, const char *str, void *data)
     @@ strmap.c (new)
      +		old = entry->value;
      +		entry->value = data;
      +	} else {
     -+		/*
     -+		 * We won't modify entry->key so it really should be const.
     -+		 */
      +		const char *key = str;
      +
      +		entry = xmalloc(sizeof(*entry));
     @@ strmap.h (new)
      +	void *value;
      +};
      +
     ++int cmp_strmap_entry(const void *hashmap_cmp_fn_data,
     ++		     const struct hashmap_entry *entry1,
     ++		     const struct hashmap_entry *entry2,
     ++		     const void *keydata);
     ++
     ++#define STRMAP_INIT { \
     ++			.map = HASHMAP_INIT(cmp_strmap_entry, NULL),  \
     ++			.strdup_strings = 1,                          \
     ++		    }
     ++
      +/*
      + * Initialize the members of the strmap.  Any keys added to the strmap will
      + * be strdup'ed with their memory managed by the strmap.
     @@ strmap.h (new)
      +/*
      + * Same as strmap_init, but for those who want to control the memory management
      + * carefully instead of using the default of strdup_strings=1.
     -+ * (OCD = Obsessive Compulsive Disorder, a joke that those who use this function
     -+ * are obsessing over minor details.)
      + */
     -+void strmap_ocd_init(struct strmap *map,
     -+		     int strdup_strings);
     ++void strmap_init_with_options(struct strmap *map,
     ++			      int strdup_strings);
      +
      +/*
      + * Remove all entries from the map, releasing any allocated resources.
  6:  61b5bf1110 !  7:  42633b8d03 strmap: add more utility functions
     @@ Commit message
          strmap: add more utility functions
      
          This adds a number of additional convienence functions I want/need:
     -      * strmap_empty()
            * strmap_get_size()
     +      * strmap_empty()
            * strmap_remove()
            * strmap_for_each_entry()
            * strmap_get_entry()
     @@ strmap.c: int strmap_contains(struct strmap *map, const char *str)
       	return find_strmap_entry(map, str) != NULL;
       }
      +
     -+void strmap_remove(struct strmap *map, const char *str, int free_util)
     ++void strmap_remove(struct strmap *map, const char *str, int free_value)
      +{
      +	struct strmap_entry entry, *ret;
      +	hashmap_entry_init(&entry.ent, strhash(str));
     @@ strmap.c: int strmap_contains(struct strmap *map, const char *str)
      +	ret = hashmap_remove_entry(&map->map, &entry, ent, NULL);
      +	if (!ret)
      +		return;
     -+	if (free_util)
     ++	if (free_value)
      +		free(ret->value);
      +	if (map->strdup_strings)
      +		free((char*)ret->key);
     @@ strmap.h: void *strmap_get(struct strmap *map, const char *str);
      +void strmap_remove(struct strmap *map, const char *str, int free_value);
      +
      +/*
     -+ * Return whether the strmap is empty.
     ++ * Return how many entries the strmap has.
      + */
     -+static inline int strmap_empty(struct strmap *map)
     ++static inline unsigned int strmap_get_size(struct strmap *map)
      +{
     -+	return hashmap_get_size(&map->map) == 0;
     ++	return hashmap_get_size(&map->map);
      +}
      +
      +/*
     -+ * Return how many entries the strmap has.
     ++ * Return whether the strmap is empty.
      + */
     -+static inline unsigned int strmap_get_size(struct strmap *map)
     ++static inline int strmap_empty(struct strmap *map)
      +{
     -+	return hashmap_get_size(&map->map);
     ++	return strmap_get_size(map) == 0;
      +}
      +
      +/*
      + * iterate through @map using @iter, @var is a pointer to a type strmap_entry
      + */
      +#define strmap_for_each_entry(mystrmap, iter, var)	\
     -+	for (var = hashmap_iter_first_entry_offset(&(mystrmap)->map, iter, \
     -+						   OFFSETOF_VAR(var, ent)); \
     ++	for (var = hashmap_iter_first_entry_offset(&(mystrmap)->map, iter, 0); \
      +		var; \
     -+		var = hashmap_iter_next_entry_offset(iter, \
     -+						     OFFSETOF_VAR(var, ent)))
     ++		var = hashmap_iter_next_entry_offset(iter, 0))
      +
       #endif /* STRMAP_H */
  7:  2ebce0c5d8 !  8:  ea942eb803 strmap: enable faster clearing and reusing of strmaps
     @@ Commit message
          Introduce strmap_partial_clear() to take advantage of this type of
          situation; it will act similar to strmap_clear() except that
          map->table's entries are zeroed instead of map->table being free'd.
     -    Making use of this function reduced the cost of reset_maps() by about
     -    20% in mert-ort, and dropped the overall runtime of my rebase testcase
     -    by just under 2%.
     +    Making use of this function reduced the cost of
     +    clear_or_reinit_internal_opts() by about 20% in mert-ort, and dropped
     +    the overall runtime of my rebase testcase by just under 2%.
      
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
       ## strmap.c ##
     -@@ strmap.c: void strmap_clear(struct strmap *map, int free_util)
     - 	hashmap_free(&map->map);
     +@@ strmap.c: void strmap_clear(struct strmap *map, int free_values)
     + 	hashmap_clear(&map->map);
       }
       
     -+void strmap_partial_clear(struct strmap *map, int free_util)
     ++void strmap_partial_clear(struct strmap *map, int free_values)
      +{
     -+	strmap_free_entries_(map, free_util);
     ++	strmap_free_entries_(map, free_values);
      +	hashmap_partial_clear(&map->map);
      +}
      +
     @@ strmap.c: void strmap_clear(struct strmap *map, int free_util)
       	struct strmap_entry *entry = find_strmap_entry(map, str);
      
       ## strmap.h ##
     -@@ strmap.h: void strmap_ocd_init(struct strmap *map,
     +@@ strmap.h: void strmap_init_with_options(struct strmap *map,
        */
       void strmap_clear(struct strmap *map, int free_values);
       
  8:  cc8d702f98 !  9:  c1d2172171 strmap: add functions facilitating use as a string->int map
     @@ Commit message
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
       ## strmap.c ##
     -@@ strmap.c: void strmap_remove(struct strmap *map, const char *str, int free_util)
     +@@ strmap.c: void strmap_remove(struct strmap *map, const char *str, int free_value)
       		free((char*)ret->key);
       	free(ret);
       }
     @@ strmap.c: void strmap_remove(struct strmap *map, const char *str, int free_util)
      +		*whence += amt;
      +	}
      +	else
     -+		strintmap_set(map, str, amt);
     ++		strintmap_set(map, str, map->default_value + amt);
      +}
      
       ## strmap.h ##
     -@@ strmap.h: static inline unsigned int strmap_get_size(struct strmap *map)
     - 		var = hashmap_iter_next_entry_offset(iter, \
     - 						     OFFSETOF_VAR(var, ent)))
     +@@ strmap.h: int cmp_strmap_entry(const void *hashmap_cmp_fn_data,
     + 			.map = HASHMAP_INIT(cmp_strmap_entry, NULL),  \
     + 			.strdup_strings = 1,                          \
     + 		    }
     ++#define STRINTMAP_INIT { \
     ++			.map.map = HASHMAP_INIT(cmp_strmap_entry, NULL),  \
     ++			.map.strdup_strings = 1,                          \
     ++			.default_value = 0,                               \
     ++		    }
     + 
     + /*
     +  * Initialize the members of the strmap.  Any keys added to the strmap will
     +@@ strmap.h: static inline int strmap_empty(struct strmap *map)
     + 		var; \
     + 		var = hashmap_iter_next_entry_offset(iter, 0))
       
      +
      +/*
     @@ strmap.h: static inline unsigned int strmap_get_size(struct strmap *map)
      +
      +struct strintmap {
      +	struct strmap map;
     ++	int default_value;
      +};
      +
      +#define strintmap_for_each_entry(mystrmap, iter, var)	\
      +	strmap_for_each_entry(&(mystrmap)->map, iter, var)
      +
     -+static inline void strintmap_init(struct strintmap *map)
     ++static inline void strintmap_init(struct strintmap *map, int default_value)
      +{
      +	strmap_init(&map->map);
     ++	map->default_value = default_value;
      +}
      +
     -+static inline void strintmap_ocd_init(struct strintmap *map,
     -+				      int strdup_strings)
     ++static inline void strintmap_init_with_options(struct strintmap *map,
     ++					       int default_value,
     ++					       int strdup_strings)
      +{
     -+	strmap_ocd_init(&map->map, strdup_strings);
     ++	strmap_init_with_options(&map->map, strdup_strings);
     ++	map->default_value = default_value;
      +}
      +
      +static inline void strintmap_clear(struct strintmap *map)
     @@ strmap.h: static inline unsigned int strmap_get_size(struct strmap *map)
      +	return strmap_get_size(&map->map);
      +}
      +
     -+static inline int strintmap_get(struct strintmap *map, const char *str,
     -+				int default_value)
     ++/*
     ++ * Returns the value for str in the map.  If str isn't found in the map,
     ++ * the map's default_value is returned.
     ++ */
     ++static inline int strintmap_get(struct strintmap *map, const char *str)
      +{
      +	struct strmap_entry *result = strmap_get_entry(&map->map, str);
      +	if (!result)
     -+		return default_value;
     ++		return map->default_value;
      +	return (intptr_t)result->value;
      +}
      +
     @@ strmap.h: static inline unsigned int strmap_get_size(struct strmap *map)
      +	strmap_put(&map->map, str, (void *)v);
      +}
      +
     ++/*
     ++ * Increment the value for str by amt.  If str isn't in the map, add it and
     ++ * set its value to default_value + amt.
     ++ */
      +void strintmap_incr(struct strintmap *map, const char *str, intptr_t amt);
      +
       #endif /* STRMAP_H */
  9:  490d3a42ad ! 10:  0f57735f5e strmap: add a strset sub-type
     @@ Commit message
      
          The difference in usage also results in some differences in API: a few
          things that aren't necessary or meaningful are dropped (namely, the
     -    free_util argument to *_clear(), and the *_get() function), and
     +    free_values argument to *_clear(), and the *_get() function), and
          strset_add() is chosen as the API instead of strset_put().
      
     +    Finally, shortlog already had a more minimal strset API; so this adds a
     +    strset_check_and_add() function for its benefit to allow it to switch
     +    over to this strset implementation.
     +
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
     + ## strmap.c ##
     +@@ strmap.c: void strintmap_incr(struct strintmap *map, const char *str, intptr_t amt)
     + 	else
     + 		strintmap_set(map, str, map->default_value + amt);
     + }
     ++
     ++int strset_check_and_add(struct strset *set, const char *str)
     ++{
     ++	if (strset_contains(set, str))
     ++		return 1;
     ++	strset_add(set, str);
     ++	return 0;
     ++}
     +
       ## strmap.h ##
     -@@ strmap.h: static inline void strintmap_set(struct strintmap *map, const char *str,
     +@@ strmap.h: int cmp_strmap_entry(const void *hashmap_cmp_fn_data,
     + 			.map.strdup_strings = 1,                          \
     + 			.default_value = 0,                               \
     + 		    }
     ++#define STRSET_INIT { \
     ++			.map.map = HASHMAP_INIT(cmp_strmap_entry, NULL),  \
     ++			.map.strdup_strings = 1,                          \
     ++		    }
       
     + /*
     +  * Initialize the members of the strmap.  Any keys added to the strmap will
     +@@ strmap.h: static inline void strintmap_set(struct strintmap *map, const char *str,
     +  */
       void strintmap_incr(struct strintmap *map, const char *str, intptr_t amt);
       
      +/*
     @@ strmap.h: static inline void strintmap_set(struct strintmap *map, const char *st
      +	strmap_init(&set->map);
      +}
      +
     -+static inline void strset_ocd_init(struct strset *set,
     -+				   int strdup_strings)
     ++static inline void strset_init_with_options(struct strset *set,
     ++					    int strdup_strings)
      +{
     -+	strmap_ocd_init(&set->map, strdup_strings);
     ++	strmap_init_with_options(&set->map, strdup_strings);
      +}
      +
      +static inline void strset_clear(struct strset *set)
     @@ strmap.h: static inline void strintmap_set(struct strintmap *map, const char *st
      +{
      +	strmap_put(&set->map, str, NULL);
      +}
     ++
     ++/* Returns 1 if str already in set.  Otherwise adds str to set and returns 0 */
     ++int strset_check_and_add(struct strset *set, const char *str);
      +
       #endif /* STRMAP_H */
 10:  eca4f1ddba ! 11:  980537e877 strmap: enable allocations to come from a mem_pool
     @@ Commit message
      
          For heavy users of strmaps, allowing the keys and entries to be
          allocated from a memory pool can provide significant overhead savings.
     -    Add an option to strmap_ocd_init() to specify a memory pool.
     +    Add an option to strmap_init_with_options() to specify a memory pool.
      
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
     @@ strmap.c
       #include "strmap.h"
      +#include "mem-pool.h"
       
     - static int cmp_strmap_entry(const void *hashmap_cmp_fn_data,
     - 			    const struct hashmap_entry *entry1,
     + int cmp_strmap_entry(const void *hashmap_cmp_fn_data,
     + 		     const struct hashmap_entry *entry1,
      @@ strmap.c: static struct strmap_entry *find_strmap_entry(struct strmap *map,
       
       void strmap_init(struct strmap *map)
       {
     --	strmap_ocd_init(map, 1);
     -+	strmap_ocd_init(map, NULL, 1);
     +-	strmap_init_with_options(map, 1);
     ++	strmap_init_with_options(map, NULL, 1);
       }
       
     - void strmap_ocd_init(struct strmap *map,
     -+		     struct mem_pool *pool,
     - 		     int strdup_strings)
     + void strmap_init_with_options(struct strmap *map,
     ++			      struct mem_pool *pool,
     + 			      int strdup_strings)
       {
       	hashmap_init(&map->map, cmp_strmap_entry, NULL, 0);
      +	map->pool = pool;
       	map->strdup_strings = strdup_strings;
       }
       
     -@@ strmap.c: static void strmap_free_entries_(struct strmap *map, int free_util)
     +@@ strmap.c: static void strmap_free_entries_(struct strmap *map, int free_values)
       	if (!map)
       		return;
       
     -+	if (!free_util && map->pool)
     ++	if (!free_values && map->pool)
      +		/* Memory other than util is owned by and freed with the pool */
      +		return;
      +
       	/*
       	 * We need to iterate over the hashmap entries and free
       	 * e->key and e->value ourselves; hashmap has no API to
     -@@ strmap.c: static void strmap_free_entries_(struct strmap *map, int free_util)
     +@@ strmap.c: static void strmap_free_entries_(struct strmap *map, int free_values)
       	hashmap_for_each_entry(&map->map, &iter, e, ent) {
     - 		if (free_util)
     + 		if (free_values)
       			free(e->value);
      -		if (map->strdup_strings)
      -			free((char*)e->key);
     @@ strmap.c: static void strmap_free_entries_(struct strmap *map, int free_util)
       }
       
      @@ strmap.c: void *strmap_put(struct strmap *map, const char *str, void *data)
     - 		 */
     + 	} else {
       		const char *key = str;
       
      -		entry = xmalloc(sizeof(*entry));
     @@ strmap.c: void *strmap_put(struct strmap *map, const char *str, void *data)
       		entry->key = key;
       		entry->value = data;
       		hashmap_add(&map->map, &entry->ent);
     -@@ strmap.c: void strmap_remove(struct strmap *map, const char *str, int free_util)
     +@@ strmap.c: void strmap_remove(struct strmap *map, const char *str, int free_value)
       		return;
     - 	if (free_util)
     + 	if (free_value)
       		free(ret->value);
      -	if (map->strdup_strings)
      -		free((char*)ret->key);
     @@ strmap.h: void strmap_init(struct strmap *map);
        * Same as strmap_init, but for those who want to control the memory management
      - * carefully instead of using the default of strdup_strings=1.
      + * carefully instead of using the default of strdup_strings=1 and pool=NULL.
     -  * (OCD = Obsessive Compulsive Disorder, a joke that those who use this function
     -  * are obsessing over minor details.)
        */
     - void strmap_ocd_init(struct strmap *map,
     -+		     struct mem_pool *pool,
     - 		     int strdup_strings);
     + void strmap_init_with_options(struct strmap *map,
     ++			      struct mem_pool *pool,
     + 			      int strdup_strings);
       
       /*
     -@@ strmap.h: static inline void strintmap_init(struct strintmap *map)
     - }
     +@@ strmap.h: static inline void strintmap_init(struct strintmap *map, int default_value)
       
     - static inline void strintmap_ocd_init(struct strintmap *map,
     -+				      struct mem_pool *pool,
     - 				      int strdup_strings)
     + static inline void strintmap_init_with_options(struct strintmap *map,
     + 					       int default_value,
     ++					       struct mem_pool *pool,
     + 					       int strdup_strings)
       {
     --	strmap_ocd_init(&map->map, strdup_strings);
     -+	strmap_ocd_init(&map->map, pool, strdup_strings);
     +-	strmap_init_with_options(&map->map, strdup_strings);
     ++	strmap_init_with_options(&map->map, pool, strdup_strings);
     + 	map->default_value = default_value;
       }
       
     - static inline void strintmap_clear(struct strintmap *map)
      @@ strmap.h: static inline void strset_init(struct strset *set)
       }
       
     - static inline void strset_ocd_init(struct strset *set,
     -+				   struct mem_pool *pool,
     - 				   int strdup_strings)
     + static inline void strset_init_with_options(struct strset *set,
     ++					    struct mem_pool *pool,
     + 					    int strdup_strings)
       {
     --	strmap_ocd_init(&set->map, strdup_strings);
     -+	strmap_ocd_init(&set->map, pool, strdup_strings);
     +-	strmap_init_with_options(&set->map, strdup_strings);
     ++	strmap_init_with_options(&set->map, pool, strdup_strings);
       }
       
       static inline void strset_clear(struct strset *set)
  -:  ---------- > 12:  7f93cbb525 strmap: take advantage of FLEXPTR_ALLOC_STR when relevant
  -:  ---------- > 13:  5f41fc63e5 Use new HASHMAP_INIT macro to simplify hashmap initialization

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v3 01/13] hashmap: add usage documentation explaining hashmap_free[_entries]()
  2020-11-02 18:55   ` [PATCH v3 00/13] Add struct strmap and associated utility functions Elijah Newren via GitGitGadget
@ 2020-11-02 18:55     ` Elijah Newren via GitGitGadget
  2020-11-02 18:55     ` [PATCH v3 02/13] hashmap: adjust spacing to fix argument alignment Elijah Newren via GitGitGadget
                       ` (13 subsequent siblings)
  14 siblings, 0 replies; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-02 18:55 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

The existence of hashmap_free() and hashmap_free_entries() confused me,
and the docs weren't clear enough.  We are dealing with a map table,
entries in that table, and possibly also things each of those entries
point to.  I had to consult other source code examples and the
implementation.  Add a brief note to clarify the differences.  This will
become even more important once we introduce a new
hashmap_partial_clear() function which will add the question of whether
the table itself has been freed.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 hashmap.h | 31 +++++++++++++++++++++++++++++--
 1 file changed, 29 insertions(+), 2 deletions(-)

diff --git a/hashmap.h b/hashmap.h
index b011b394fe..2994dc7a9c 100644
--- a/hashmap.h
+++ b/hashmap.h
@@ -236,13 +236,40 @@ void hashmap_init(struct hashmap *map,
 void hashmap_free_(struct hashmap *map, ssize_t offset);
 
 /*
- * Frees a hashmap structure and allocated memory, leaves entries undisturbed
+ * Frees a hashmap structure and allocated memory for the table, but does not
+ * free the entries nor anything they point to.
+ *
+ * Usage note:
+ *
+ * Many callers will need to iterate over all entries and free the data each
+ * entry points to; in such a case, they can free the entry itself while at it.
+ * Thus, you might see:
+ *
+ *    hashmap_for_each_entry(map, hashmap_iter, e, hashmap_entry_name) {
+ *      free(e->somefield);
+ *      free(e);
+ *    }
+ *    hashmap_free(map);
+ *
+ * instead of
+ *
+ *    hashmap_for_each_entry(map, hashmap_iter, e, hashmap_entry_name) {
+ *      free(e->somefield);
+ *    }
+ *    hashmap_free_entries(map, struct my_entry_struct, hashmap_entry_name);
+ *
+ * to avoid the implicit extra loop over the entries.  However, if there are
+ * no special fields in your entry that need to be freed beyond the entry
+ * itself, it is probably simpler to avoid the explicit loop and just call
+ * hashmap_free_entries().
  */
 #define hashmap_free(map) hashmap_free_(map, -1)
 
 /*
  * Frees @map and all entries.  @type is the struct type of the entry
- * where @member is the hashmap_entry struct used to associate with @map
+ * where @member is the hashmap_entry struct used to associate with @map.
+ *
+ * See usage note above hashmap_free().
  */
 #define hashmap_free_entries(map, type, member) \
 	hashmap_free_(map, offsetof(type, member));
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v3 02/13] hashmap: adjust spacing to fix argument alignment
  2020-11-02 18:55   ` [PATCH v3 00/13] Add struct strmap and associated utility functions Elijah Newren via GitGitGadget
  2020-11-02 18:55     ` [PATCH v3 01/13] hashmap: add usage documentation explaining hashmap_free[_entries]() Elijah Newren via GitGitGadget
@ 2020-11-02 18:55     ` Elijah Newren via GitGitGadget
  2020-11-02 18:55     ` [PATCH v3 03/13] hashmap: allow re-use after hashmap_free() Elijah Newren via GitGitGadget
                       ` (12 subsequent siblings)
  14 siblings, 0 replies; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-02 18:55 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

No actual code changes; just whitespace adjustments.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 hashmap.c | 17 +++++++++--------
 hashmap.h | 22 +++++++++++-----------
 2 files changed, 20 insertions(+), 19 deletions(-)

diff --git a/hashmap.c b/hashmap.c
index 09813e1a46..e44d8a3e85 100644
--- a/hashmap.c
+++ b/hashmap.c
@@ -92,8 +92,9 @@ static void alloc_table(struct hashmap *map, unsigned int size)
 }
 
 static inline int entry_equals(const struct hashmap *map,
-		const struct hashmap_entry *e1, const struct hashmap_entry *e2,
-		const void *keydata)
+			       const struct hashmap_entry *e1,
+			       const struct hashmap_entry *e2,
+			       const void *keydata)
 {
 	return (e1 == e2) ||
 	       (e1->hash == e2->hash &&
@@ -101,7 +102,7 @@ static inline int entry_equals(const struct hashmap *map,
 }
 
 static inline unsigned int bucket(const struct hashmap *map,
-		const struct hashmap_entry *key)
+				  const struct hashmap_entry *key)
 {
 	return key->hash & (map->tablesize - 1);
 }
@@ -148,7 +149,7 @@ static int always_equal(const void *unused_cmp_data,
 }
 
 void hashmap_init(struct hashmap *map, hashmap_cmp_fn equals_function,
-		const void *cmpfn_data, size_t initial_size)
+		  const void *cmpfn_data, size_t initial_size)
 {
 	unsigned int size = HASHMAP_INITIAL_SIZE;
 
@@ -199,7 +200,7 @@ struct hashmap_entry *hashmap_get(const struct hashmap *map,
 }
 
 struct hashmap_entry *hashmap_get_next(const struct hashmap *map,
-			const struct hashmap_entry *entry)
+				       const struct hashmap_entry *entry)
 {
 	struct hashmap_entry *e = entry->next;
 	for (; e; e = e->next)
@@ -225,8 +226,8 @@ void hashmap_add(struct hashmap *map, struct hashmap_entry *entry)
 }
 
 struct hashmap_entry *hashmap_remove(struct hashmap *map,
-					const struct hashmap_entry *key,
-					const void *keydata)
+				     const struct hashmap_entry *key,
+				     const void *keydata)
 {
 	struct hashmap_entry *old;
 	struct hashmap_entry **e = find_entry_ptr(map, key, keydata);
@@ -249,7 +250,7 @@ struct hashmap_entry *hashmap_remove(struct hashmap *map,
 }
 
 struct hashmap_entry *hashmap_put(struct hashmap *map,
-				struct hashmap_entry *entry)
+				  struct hashmap_entry *entry)
 {
 	struct hashmap_entry *old = hashmap_remove(map, entry, NULL);
 	hashmap_add(map, entry);
diff --git a/hashmap.h b/hashmap.h
index 2994dc7a9c..904f61d6e1 100644
--- a/hashmap.h
+++ b/hashmap.h
@@ -228,9 +228,9 @@ struct hashmap {
  * prevent expensive resizing. If 0, the table is dynamically resized.
  */
 void hashmap_init(struct hashmap *map,
-			 hashmap_cmp_fn equals_function,
-			 const void *equals_function_data,
-			 size_t initial_size);
+		  hashmap_cmp_fn equals_function,
+		  const void *equals_function_data,
+		  size_t initial_size);
 
 /* internal function for freeing hashmap */
 void hashmap_free_(struct hashmap *map, ssize_t offset);
@@ -288,7 +288,7 @@ void hashmap_free_(struct hashmap *map, ssize_t offset);
  * and if it is on stack, you can just let it go out of scope).
  */
 static inline void hashmap_entry_init(struct hashmap_entry *e,
-					unsigned int hash)
+				      unsigned int hash)
 {
 	e->hash = hash;
 	e->next = NULL;
@@ -330,8 +330,8 @@ static inline unsigned int hashmap_get_size(struct hashmap *map)
  * to `hashmap_cmp_fn` to decide whether the entry matches the key.
  */
 struct hashmap_entry *hashmap_get(const struct hashmap *map,
-				const struct hashmap_entry *key,
-				const void *keydata);
+				  const struct hashmap_entry *key,
+				  const void *keydata);
 
 /*
  * Returns the hashmap entry for the specified hash code and key data,
@@ -364,7 +364,7 @@ static inline struct hashmap_entry *hashmap_get_from_hash(
  * call to `hashmap_get` or `hashmap_get_next`.
  */
 struct hashmap_entry *hashmap_get_next(const struct hashmap *map,
-			const struct hashmap_entry *entry);
+				       const struct hashmap_entry *entry);
 
 /*
  * Adds a hashmap entry. This allows to add duplicate entries (i.e.
@@ -384,7 +384,7 @@ void hashmap_add(struct hashmap *map, struct hashmap_entry *entry);
  * Returns the replaced entry, or NULL if not found (i.e. the entry was added).
  */
 struct hashmap_entry *hashmap_put(struct hashmap *map,
-				struct hashmap_entry *entry);
+				  struct hashmap_entry *entry);
 
 /*
  * Adds or replaces a hashmap entry contained within @keyvar,
@@ -406,8 +406,8 @@ struct hashmap_entry *hashmap_put(struct hashmap *map,
  * Argument explanation is the same as in `hashmap_get`.
  */
 struct hashmap_entry *hashmap_remove(struct hashmap *map,
-					const struct hashmap_entry *key,
-					const void *keydata);
+				     const struct hashmap_entry *key,
+				     const void *keydata);
 
 /*
  * Removes a hashmap entry contained within @keyvar,
@@ -449,7 +449,7 @@ struct hashmap_entry *hashmap_iter_next(struct hashmap_iter *iter);
 
 /* Initializes the iterator and returns the first entry, if any. */
 static inline struct hashmap_entry *hashmap_iter_first(struct hashmap *map,
-		struct hashmap_iter *iter)
+						       struct hashmap_iter *iter)
 {
 	hashmap_iter_init(map, iter);
 	return hashmap_iter_next(iter);
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v3 03/13] hashmap: allow re-use after hashmap_free()
  2020-11-02 18:55   ` [PATCH v3 00/13] Add struct strmap and associated utility functions Elijah Newren via GitGitGadget
  2020-11-02 18:55     ` [PATCH v3 01/13] hashmap: add usage documentation explaining hashmap_free[_entries]() Elijah Newren via GitGitGadget
  2020-11-02 18:55     ` [PATCH v3 02/13] hashmap: adjust spacing to fix argument alignment Elijah Newren via GitGitGadget
@ 2020-11-02 18:55     ` Elijah Newren via GitGitGadget
  2020-11-02 18:55     ` [PATCH v3 04/13] hashmap: introduce a new hashmap_partial_clear() Elijah Newren via GitGitGadget
                       ` (11 subsequent siblings)
  14 siblings, 0 replies; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-02 18:55 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Previously, once map->table had been freed, any calls to hashmap_put(),
hashmap_get(), or hashmap_remove() would cause a NULL pointer
dereference (since hashmap_free_() also zeros the memory; without that
zeroing, calling these functions would cause a use-after-free problem).

Modify these functions to check for a NULL table and automatically
allocate as needed.

Also add a HASHMAP_INIT(fn, data) macro for initializing hashmaps on the
stack without calling hashmap_init().

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 hashmap.c | 16 ++++++++++++++--
 hashmap.h |  3 +++
 2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/hashmap.c b/hashmap.c
index e44d8a3e85..bb7c9979b8 100644
--- a/hashmap.c
+++ b/hashmap.c
@@ -114,6 +114,7 @@ int hashmap_bucket(const struct hashmap *map, unsigned int hash)
 
 static void rehash(struct hashmap *map, unsigned int newsize)
 {
+	/* map->table MUST NOT be NULL when this function is called */
 	unsigned int i, oldsize = map->tablesize;
 	struct hashmap_entry **oldtable = map->table;
 
@@ -134,6 +135,7 @@ static void rehash(struct hashmap *map, unsigned int newsize)
 static inline struct hashmap_entry **find_entry_ptr(const struct hashmap *map,
 		const struct hashmap_entry *key, const void *keydata)
 {
+	/* map->table MUST NOT be NULL when this function is called */
 	struct hashmap_entry **e = &map->table[bucket(map, key)];
 	while (*e && !entry_equals(map, *e, key, keydata))
 		e = &(*e)->next;
@@ -196,6 +198,8 @@ struct hashmap_entry *hashmap_get(const struct hashmap *map,
 				const struct hashmap_entry *key,
 				const void *keydata)
 {
+	if (!map->table)
+		return NULL;
 	return *find_entry_ptr(map, key, keydata);
 }
 
@@ -211,8 +215,12 @@ struct hashmap_entry *hashmap_get_next(const struct hashmap *map,
 
 void hashmap_add(struct hashmap *map, struct hashmap_entry *entry)
 {
-	unsigned int b = bucket(map, entry);
+	unsigned int b;
+
+	if (!map->table)
+		alloc_table(map, HASHMAP_INITIAL_SIZE);
 
+	b = bucket(map, entry);
 	/* add entry */
 	entry->next = map->table[b];
 	map->table[b] = entry;
@@ -230,7 +238,11 @@ struct hashmap_entry *hashmap_remove(struct hashmap *map,
 				     const void *keydata)
 {
 	struct hashmap_entry *old;
-	struct hashmap_entry **e = find_entry_ptr(map, key, keydata);
+	struct hashmap_entry **e;
+
+	if (!map->table)
+		return NULL;
+	e = find_entry_ptr(map, key, keydata);
 	if (!*e)
 		return NULL;
 
diff --git a/hashmap.h b/hashmap.h
index 904f61d6e1..3b0f2bcade 100644
--- a/hashmap.h
+++ b/hashmap.h
@@ -210,6 +210,9 @@ struct hashmap {
 
 /* hashmap functions */
 
+#define HASHMAP_INIT(fn, data) { .cmpfn = fn, .cmpfn_data = data, \
+				 .do_count_items = 1 }
+
 /*
  * Initializes a hashmap structure.
  *
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v3 04/13] hashmap: introduce a new hashmap_partial_clear()
  2020-11-02 18:55   ` [PATCH v3 00/13] Add struct strmap and associated utility functions Elijah Newren via GitGitGadget
                       ` (2 preceding siblings ...)
  2020-11-02 18:55     ` [PATCH v3 03/13] hashmap: allow re-use after hashmap_free() Elijah Newren via GitGitGadget
@ 2020-11-02 18:55     ` Elijah Newren via GitGitGadget
  2020-11-02 18:55     ` [PATCH v3 05/13] hashmap: provide deallocation function names Elijah Newren via GitGitGadget
                       ` (10 subsequent siblings)
  14 siblings, 0 replies; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-02 18:55 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

merge-ort is a heavy user of strmaps, which are built on hashmap.[ch].
clear_or_reinit_internal_opts() in merge-ort was taking about 12% of
overall runtime in my testcase involving rebasing 35 patches of
linux.git across a big rename.  clear_or_reinit_internal_opts() was
calling hashmap_free() followed by hashmap_init(), meaning that not only
was it freeing all the memory associated with each of the strmaps just
to immediately allocate a new array again, it was allocating a new array
that was likely smaller than needed (thus resulting in later need to
rehash things).  The ending size of the map table on the previous commit
was likely almost perfectly sized for the next commit we wanted to pick,
and not dropping and reallocating the table immediately is a win.

Add some new API to hashmap to clear a hashmap of entries without
freeing map->table (and instead only zeroing it out like alloc_table()
would do, along with zeroing the count of items in the table and the
shrink_at field).

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 hashmap.c | 39 +++++++++++++++++++++++++++------------
 hashmap.h | 13 ++++++++++++-
 2 files changed, 39 insertions(+), 13 deletions(-)

diff --git a/hashmap.c b/hashmap.c
index bb7c9979b8..922ed07954 100644
--- a/hashmap.c
+++ b/hashmap.c
@@ -174,22 +174,37 @@ void hashmap_init(struct hashmap *map, hashmap_cmp_fn equals_function,
 	map->do_count_items = 1;
 }
 
+static void free_individual_entries(struct hashmap *map, ssize_t entry_offset)
+{
+	struct hashmap_iter iter;
+	struct hashmap_entry *e;
+
+	hashmap_iter_init(map, &iter);
+	while ((e = hashmap_iter_next(&iter)))
+		/*
+		 * like container_of, but using caller-calculated
+		 * offset (caller being hashmap_free_entries)
+		 */
+		free((char *)e - entry_offset);
+}
+
+void hashmap_partial_clear_(struct hashmap *map, ssize_t entry_offset)
+{
+	if (!map || !map->table)
+		return;
+	if (entry_offset >= 0)  /* called by hashmap_clear_entries */
+		free_individual_entries(map, entry_offset);
+	memset(map->table, 0, map->tablesize * sizeof(struct hashmap_entry *));
+	map->shrink_at = 0;
+	map->private_size = 0;
+}
+
 void hashmap_free_(struct hashmap *map, ssize_t entry_offset)
 {
 	if (!map || !map->table)
 		return;
-	if (entry_offset >= 0) { /* called by hashmap_free_entries */
-		struct hashmap_iter iter;
-		struct hashmap_entry *e;
-
-		hashmap_iter_init(map, &iter);
-		while ((e = hashmap_iter_next(&iter)))
-			/*
-			 * like container_of, but using caller-calculated
-			 * offset (caller being hashmap_free_entries)
-			 */
-			free((char *)e - entry_offset);
-	}
+	if (entry_offset >= 0)  /* called by hashmap_free_entries */
+		free_individual_entries(map, entry_offset);
 	free(map->table);
 	memset(map, 0, sizeof(*map));
 }
diff --git a/hashmap.h b/hashmap.h
index 3b0f2bcade..e9430d582a 100644
--- a/hashmap.h
+++ b/hashmap.h
@@ -235,7 +235,8 @@ void hashmap_init(struct hashmap *map,
 		  const void *equals_function_data,
 		  size_t initial_size);
 
-/* internal function for freeing hashmap */
+/* internal functions for clearing or freeing hashmap */
+void hashmap_partial_clear_(struct hashmap *map, ssize_t offset);
 void hashmap_free_(struct hashmap *map, ssize_t offset);
 
 /*
@@ -268,6 +269,16 @@ void hashmap_free_(struct hashmap *map, ssize_t offset);
  */
 #define hashmap_free(map) hashmap_free_(map, -1)
 
+/*
+ * Basically the same as calling hashmap_free() followed by hashmap_init(),
+ * but doesn't incur the overhead of deallocating and reallocating
+ * map->table; it leaves map->table allocated and the same size but zeroes
+ * it out so it's ready for use again as an empty map.  As with
+ * hashmap_free(), you may need to free the entries yourself before calling
+ * this function.
+ */
+#define hashmap_partial_clear(map) hashmap_partial_clear_(map, -1)
+
 /*
  * Frees @map and all entries.  @type is the struct type of the entry
  * where @member is the hashmap_entry struct used to associate with @map.
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v3 05/13] hashmap: provide deallocation function names
  2020-11-02 18:55   ` [PATCH v3 00/13] Add struct strmap and associated utility functions Elijah Newren via GitGitGadget
                       ` (3 preceding siblings ...)
  2020-11-02 18:55     ` [PATCH v3 04/13] hashmap: introduce a new hashmap_partial_clear() Elijah Newren via GitGitGadget
@ 2020-11-02 18:55     ` Elijah Newren via GitGitGadget
  2020-11-02 18:55     ` [PATCH v3 06/13] strmap: new utility functions Elijah Newren via GitGitGadget
                       ` (9 subsequent siblings)
  14 siblings, 0 replies; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-02 18:55 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

hashmap_free(), hashmap_free_entries(), and hashmap_free_() have existed
for a while, but aren't necessarily the clearest names, especially with
hashmap_partial_clear() being added to the mix and lazy-initialization
now being supported.  Peff suggested we adopt the following names[1]:

  - hashmap_clear() - remove all entries and de-allocate any
    hashmap-specific data, but be ready for reuse

  - hashmap_clear_and_free() - ditto, but free the entries themselves

  - hashmap_partial_clear() - remove all entries but don't deallocate
    table

  - hashmap_partial_clear_and_free() - ditto, but free the entries

This patch provides the new names and converts all existing callers over
to the new naming scheme.

[1] https://lore.kernel.org/git/20201030125059.GA3277724@coredump.intra.peff.net/

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 add-interactive.c       |  2 +-
 blame.c                 |  2 +-
 bloom.c                 |  2 +-
 builtin/fetch.c         |  6 +++---
 builtin/shortlog.c      |  2 +-
 config.c                |  2 +-
 diff.c                  |  4 ++--
 diffcore-rename.c       |  2 +-
 dir.c                   |  8 ++++----
 hashmap.c               |  6 +++---
 hashmap.h               | 44 +++++++++++++++++++++++++----------------
 merge-recursive.c       |  6 +++---
 name-hash.c             |  4 ++--
 object.c                |  2 +-
 oidmap.c                |  2 +-
 patch-ids.c             |  2 +-
 range-diff.c            |  2 +-
 ref-filter.c            |  2 +-
 revision.c              |  2 +-
 sequencer.c             |  4 ++--
 submodule-config.c      |  4 ++--
 t/helper/test-hashmap.c |  6 +++---
 22 files changed, 63 insertions(+), 53 deletions(-)

diff --git a/add-interactive.c b/add-interactive.c
index 555c4abf32..a14c0feaa2 100644
--- a/add-interactive.c
+++ b/add-interactive.c
@@ -557,7 +557,7 @@ static int get_modified_files(struct repository *r,
 		if (ps)
 			clear_pathspec(&rev.prune_data);
 	}
-	hashmap_free_entries(&s.file_map, struct pathname_entry, ent);
+	hashmap_clear_and_free(&s.file_map, struct pathname_entry, ent);
 	if (unmerged_count)
 		*unmerged_count = s.unmerged_count;
 	if (binary_count)
diff --git a/blame.c b/blame.c
index 686845b2b4..229beb6452 100644
--- a/blame.c
+++ b/blame.c
@@ -435,7 +435,7 @@ static void get_fingerprint(struct fingerprint *result,
 
 static void free_fingerprint(struct fingerprint *f)
 {
-	hashmap_free(&f->map);
+	hashmap_clear(&f->map);
 	free(f->entries);
 }
 
diff --git a/bloom.c b/bloom.c
index 68c73200a5..719c313a1c 100644
--- a/bloom.c
+++ b/bloom.c
@@ -287,7 +287,7 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r,
 		}
 
 	cleanup:
-		hashmap_free_entries(&pathmap, struct pathmap_hash_entry, entry);
+		hashmap_clear_and_free(&pathmap, struct pathmap_hash_entry, entry);
 	} else {
 		for (i = 0; i < diff_queued_diff.nr; i++)
 			diff_free_filepair(diff_queued_diff.queue[i]);
diff --git a/builtin/fetch.c b/builtin/fetch.c
index f9c3c49f14..ecf8537605 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -393,7 +393,7 @@ static void find_non_local_tags(const struct ref *refs,
 		item = refname_hash_add(&remote_refs, ref->name, &ref->old_oid);
 		string_list_insert(&remote_refs_list, ref->name);
 	}
-	hashmap_free_entries(&existing_refs, struct refname_hash_entry, ent);
+	hashmap_clear_and_free(&existing_refs, struct refname_hash_entry, ent);
 
 	/*
 	 * We may have a final lightweight tag that needs to be
@@ -428,7 +428,7 @@ static void find_non_local_tags(const struct ref *refs,
 		**tail = rm;
 		*tail = &rm->next;
 	}
-	hashmap_free_entries(&remote_refs, struct refname_hash_entry, ent);
+	hashmap_clear_and_free(&remote_refs, struct refname_hash_entry, ent);
 	string_list_clear(&remote_refs_list, 0);
 	oidset_clear(&fetch_oids);
 }
@@ -573,7 +573,7 @@ static struct ref *get_ref_map(struct remote *remote,
 		}
 	}
 	if (existing_refs_populated)
-		hashmap_free_entries(&existing_refs, struct refname_hash_entry, ent);
+		hashmap_clear_and_free(&existing_refs, struct refname_hash_entry, ent);
 
 	return ref_map;
 }
diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index 0a5c4968f6..83f0a739b4 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -220,7 +220,7 @@ static void strset_clear(struct strset *ss)
 {
 	if (!ss->map.table)
 		return;
-	hashmap_free_entries(&ss->map, struct strset_item, ent);
+	hashmap_clear_and_free(&ss->map, struct strset_item, ent);
 }
 
 static void insert_records_from_trailers(struct shortlog *log,
diff --git a/config.c b/config.c
index 2bdff4457b..8f324ed3a6 100644
--- a/config.c
+++ b/config.c
@@ -1963,7 +1963,7 @@ void git_configset_clear(struct config_set *cs)
 		free(entry->key);
 		string_list_clear(&entry->value_list, 1);
 	}
-	hashmap_free_entries(&cs->config_hash, struct config_set_element, ent);
+	hashmap_clear_and_free(&cs->config_hash, struct config_set_element, ent);
 	cs->hash_initialized = 0;
 	free(cs->list.items);
 	cs->list.nr = 0;
diff --git a/diff.c b/diff.c
index 2bb2f8f57e..8e0e59f5cf 100644
--- a/diff.c
+++ b/diff.c
@@ -6289,9 +6289,9 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
 			if (o->color_moved == COLOR_MOVED_ZEBRA_DIM)
 				dim_moved_lines(o);
 
-			hashmap_free_entries(&add_lines, struct moved_entry,
+			hashmap_clear_and_free(&add_lines, struct moved_entry,
 						ent);
-			hashmap_free_entries(&del_lines, struct moved_entry,
+			hashmap_clear_and_free(&del_lines, struct moved_entry,
 						ent);
 		}
 
diff --git a/diffcore-rename.c b/diffcore-rename.c
index 99e63e90f8..d367a6d244 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -407,7 +407,7 @@ static int find_exact_renames(struct diff_options *options)
 		renames += find_identical_files(&file_table, i, options);
 
 	/* Free the hash data structure and entries */
-	hashmap_free_entries(&file_table, struct file_similarity, entry);
+	hashmap_clear_and_free(&file_table, struct file_similarity, entry);
 
 	return renames;
 }
diff --git a/dir.c b/dir.c
index 78387110e6..161dce121e 100644
--- a/dir.c
+++ b/dir.c
@@ -817,8 +817,8 @@ static void add_pattern_to_hashsets(struct pattern_list *pl, struct path_pattern
 
 clear_hashmaps:
 	warning(_("disabling cone pattern matching"));
-	hashmap_free_entries(&pl->parent_hashmap, struct pattern_entry, ent);
-	hashmap_free_entries(&pl->recursive_hashmap, struct pattern_entry, ent);
+	hashmap_clear_and_free(&pl->parent_hashmap, struct pattern_entry, ent);
+	hashmap_clear_and_free(&pl->recursive_hashmap, struct pattern_entry, ent);
 	pl->use_cone_patterns = 0;
 }
 
@@ -921,8 +921,8 @@ void clear_pattern_list(struct pattern_list *pl)
 		free(pl->patterns[i]);
 	free(pl->patterns);
 	free(pl->filebuf);
-	hashmap_free_entries(&pl->recursive_hashmap, struct pattern_entry, ent);
-	hashmap_free_entries(&pl->parent_hashmap, struct pattern_entry, ent);
+	hashmap_clear_and_free(&pl->recursive_hashmap, struct pattern_entry, ent);
+	hashmap_clear_and_free(&pl->parent_hashmap, struct pattern_entry, ent);
 
 	memset(pl, 0, sizeof(*pl));
 }
diff --git a/hashmap.c b/hashmap.c
index 922ed07954..5009471800 100644
--- a/hashmap.c
+++ b/hashmap.c
@@ -183,7 +183,7 @@ static void free_individual_entries(struct hashmap *map, ssize_t entry_offset)
 	while ((e = hashmap_iter_next(&iter)))
 		/*
 		 * like container_of, but using caller-calculated
-		 * offset (caller being hashmap_free_entries)
+		 * offset (caller being hashmap_clear_and_free)
 		 */
 		free((char *)e - entry_offset);
 }
@@ -199,11 +199,11 @@ void hashmap_partial_clear_(struct hashmap *map, ssize_t entry_offset)
 	map->private_size = 0;
 }
 
-void hashmap_free_(struct hashmap *map, ssize_t entry_offset)
+void hashmap_clear_(struct hashmap *map, ssize_t entry_offset)
 {
 	if (!map || !map->table)
 		return;
-	if (entry_offset >= 0)  /* called by hashmap_free_entries */
+	if (entry_offset >= 0)  /* called by hashmap_clear_and_free */
 		free_individual_entries(map, entry_offset);
 	free(map->table);
 	memset(map, 0, sizeof(*map));
diff --git a/hashmap.h b/hashmap.h
index e9430d582a..7251687d73 100644
--- a/hashmap.h
+++ b/hashmap.h
@@ -96,7 +96,7 @@
  *         }
  *
  *         if (!strcmp("end", action)) {
- *             hashmap_free_entries(&map, struct long2string, ent);
+ *             hashmap_clear_and_free(&map, struct long2string, ent);
  *             break;
  *         }
  *     }
@@ -237,7 +237,7 @@ void hashmap_init(struct hashmap *map,
 
 /* internal functions for clearing or freeing hashmap */
 void hashmap_partial_clear_(struct hashmap *map, ssize_t offset);
-void hashmap_free_(struct hashmap *map, ssize_t offset);
+void hashmap_clear_(struct hashmap *map, ssize_t offset);
 
 /*
  * Frees a hashmap structure and allocated memory for the table, but does not
@@ -253,40 +253,50 @@ void hashmap_free_(struct hashmap *map, ssize_t offset);
  *      free(e->somefield);
  *      free(e);
  *    }
- *    hashmap_free(map);
+ *    hashmap_clear(map);
  *
  * instead of
  *
  *    hashmap_for_each_entry(map, hashmap_iter, e, hashmap_entry_name) {
  *      free(e->somefield);
  *    }
- *    hashmap_free_entries(map, struct my_entry_struct, hashmap_entry_name);
+ *    hashmap_clear_and_free(map, struct my_entry_struct, hashmap_entry_name);
  *
  * to avoid the implicit extra loop over the entries.  However, if there are
  * no special fields in your entry that need to be freed beyond the entry
  * itself, it is probably simpler to avoid the explicit loop and just call
- * hashmap_free_entries().
+ * hashmap_clear_and_free().
  */
-#define hashmap_free(map) hashmap_free_(map, -1)
+#define hashmap_clear(map) hashmap_clear_(map, -1)
 
 /*
- * Basically the same as calling hashmap_free() followed by hashmap_init(),
- * but doesn't incur the overhead of deallocating and reallocating
- * map->table; it leaves map->table allocated and the same size but zeroes
- * it out so it's ready for use again as an empty map.  As with
- * hashmap_free(), you may need to free the entries yourself before calling
- * this function.
+ * Similar to hashmap_clear(), except that the table is no deallocated; it
+ * is merely zeroed out but left the same size as before.  If the hashmap
+ * will be reused, this avoids the overhead of deallocating and
+ * reallocating map->table.  As with hashmap_clear(), you may need to free
+ * the entries yourself before calling this function.
  */
 #define hashmap_partial_clear(map) hashmap_partial_clear_(map, -1)
 
 /*
- * Frees @map and all entries.  @type is the struct type of the entry
- * where @member is the hashmap_entry struct used to associate with @map.
+ * Similar to hashmap_clear() but also frees all entries.  @type is the
+ * struct type of the entry where @member is the hashmap_entry struct used
+ * to associate with @map.
  *
- * See usage note above hashmap_free().
+ * See usage note above hashmap_clear().
  */
-#define hashmap_free_entries(map, type, member) \
-	hashmap_free_(map, offsetof(type, member));
+#define hashmap_clear_and_free(map, type, member) \
+	hashmap_clear_(map, offsetof(type, member))
+
+/*
+ * Similar to hashmap_partial_clear() but also frees all entries.  @type is
+ * the struct type of the entry where @member is the hashmap_entry struct
+ * used to associate with @map.
+ *
+ * See usage note above hashmap_clear().
+ */
+#define hashmap_partial_clear_and_free(map, type, member) \
+	hashmap_partial_clear_(map, offsetof(type, member))
 
 /* hashmap_entry functions */
 
diff --git a/merge-recursive.c b/merge-recursive.c
index d0214335a7..f736a0f632 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -2651,7 +2651,7 @@ static struct string_list *get_renames(struct merge_options *opt,
 		free(e->target_file);
 		string_list_clear(&e->source_files, 0);
 	}
-	hashmap_free_entries(&collisions, struct collision_entry, ent);
+	hashmap_clear_and_free(&collisions, struct collision_entry, ent);
 	return renames;
 }
 
@@ -2870,7 +2870,7 @@ static void initial_cleanup_rename(struct diff_queue_struct *pairs,
 		strbuf_release(&e->new_dir);
 		/* possible_new_dirs already cleared in get_directory_renames */
 	}
-	hashmap_free_entries(dir_renames, struct dir_rename_entry, ent);
+	hashmap_clear_and_free(dir_renames, struct dir_rename_entry, ent);
 	free(dir_renames);
 
 	free(pairs->queue);
@@ -3497,7 +3497,7 @@ static int merge_trees_internal(struct merge_options *opt,
 		string_list_clear(entries, 1);
 		free(entries);
 
-		hashmap_free_entries(&opt->priv->current_file_dir_set,
+		hashmap_clear_and_free(&opt->priv->current_file_dir_set,
 					struct path_hashmap_entry, e);
 
 		if (clean < 0) {
diff --git a/name-hash.c b/name-hash.c
index fb526a3775..5d3c7b12c1 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -726,6 +726,6 @@ void free_name_hash(struct index_state *istate)
 		return;
 	istate->name_hash_initialized = 0;
 
-	hashmap_free(&istate->name_hash);
-	hashmap_free_entries(&istate->dir_hash, struct dir_entry, ent);
+	hashmap_clear(&istate->name_hash);
+	hashmap_clear_and_free(&istate->dir_hash, struct dir_entry, ent);
 }
diff --git a/object.c b/object.c
index 3257518656..b8406409d5 100644
--- a/object.c
+++ b/object.c
@@ -532,7 +532,7 @@ void raw_object_store_clear(struct raw_object_store *o)
 	close_object_store(o);
 	o->packed_git = NULL;
 
-	hashmap_free(&o->pack_map);
+	hashmap_clear(&o->pack_map);
 }
 
 void parsed_object_pool_clear(struct parsed_object_pool *o)
diff --git a/oidmap.c b/oidmap.c
index 423aa014a3..286a04a53c 100644
--- a/oidmap.c
+++ b/oidmap.c
@@ -27,7 +27,7 @@ void oidmap_free(struct oidmap *map, int free_entries)
 		return;
 
 	/* TODO: make oidmap itself not depend on struct layouts */
-	hashmap_free_(&map->map, free_entries ? 0 : -1);
+	hashmap_clear_(&map->map, free_entries ? 0 : -1);
 }
 
 void *oidmap_get(const struct oidmap *map, const struct object_id *key)
diff --git a/patch-ids.c b/patch-ids.c
index 12aa6d494b..21973e4933 100644
--- a/patch-ids.c
+++ b/patch-ids.c
@@ -71,7 +71,7 @@ int init_patch_ids(struct repository *r, struct patch_ids *ids)
 
 int free_patch_ids(struct patch_ids *ids)
 {
-	hashmap_free_entries(&ids->patches, struct patch_id, ent);
+	hashmap_clear_and_free(&ids->patches, struct patch_id, ent);
 	return 0;
 }
 
diff --git a/range-diff.c b/range-diff.c
index 24dc435e48..befeecae44 100644
--- a/range-diff.c
+++ b/range-diff.c
@@ -266,7 +266,7 @@ static void find_exact_matches(struct string_list *a, struct string_list *b)
 		}
 	}
 
-	hashmap_free(&map);
+	hashmap_clear(&map);
 }
 
 static void diffsize_consume(void *data, char *line, unsigned long len)
diff --git a/ref-filter.c b/ref-filter.c
index c62f6b4822..5e66b8cd76 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2222,7 +2222,7 @@ void ref_array_clear(struct ref_array *array)
 	used_atom_cnt = 0;
 
 	if (ref_to_worktree_map.worktrees) {
-		hashmap_free_entries(&(ref_to_worktree_map.map),
+		hashmap_clear_and_free(&(ref_to_worktree_map.map),
 					struct ref_to_worktree_entry, ent);
 		free_worktrees(ref_to_worktree_map.worktrees);
 		ref_to_worktree_map.worktrees = NULL;
diff --git a/revision.c b/revision.c
index aa62212040..f27649d45d 100644
--- a/revision.c
+++ b/revision.c
@@ -139,7 +139,7 @@ static void paths_and_oids_clear(struct hashmap *map)
 		free(entry->path);
 	}
 
-	hashmap_free_entries(map, struct path_and_oids_entry, ent);
+	hashmap_clear_and_free(map, struct path_and_oids_entry, ent);
 }
 
 static void paths_and_oids_insert(struct hashmap *map,
diff --git a/sequencer.c b/sequencer.c
index 00acb12496..23a09c3e7a 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -5058,7 +5058,7 @@ static int make_script_with_merges(struct pretty_print_context *pp,
 
 	oidmap_free(&commit2todo, 1);
 	oidmap_free(&state.commit2label, 1);
-	hashmap_free_entries(&state.labels, struct labels_entry, entry);
+	hashmap_clear_and_free(&state.labels, struct labels_entry, entry);
 	strbuf_release(&state.buf);
 
 	return 0;
@@ -5577,7 +5577,7 @@ int todo_list_rearrange_squash(struct todo_list *todo_list)
 	for (i = 0; i < todo_list->nr; i++)
 		free(subjects[i]);
 	free(subjects);
-	hashmap_free_entries(&subject2item, struct subject2item_entry, entry);
+	hashmap_clear_and_free(&subject2item, struct subject2item_entry, entry);
 
 	clear_commit_todo_item(&commit_todo);
 
diff --git a/submodule-config.c b/submodule-config.c
index c569e22aa3..f502505566 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -103,8 +103,8 @@ static void submodule_cache_clear(struct submodule_cache *cache)
 				ent /* member name */)
 		free_one_config(entry);
 
-	hashmap_free_entries(&cache->for_path, struct submodule_entry, ent);
-	hashmap_free_entries(&cache->for_name, struct submodule_entry, ent);
+	hashmap_clear_and_free(&cache->for_path, struct submodule_entry, ent);
+	hashmap_clear_and_free(&cache->for_name, struct submodule_entry, ent);
 	cache->initialized = 0;
 	cache->gitmodules_read = 0;
 }
diff --git a/t/helper/test-hashmap.c b/t/helper/test-hashmap.c
index f38706216f..2475663b49 100644
--- a/t/helper/test-hashmap.c
+++ b/t/helper/test-hashmap.c
@@ -110,7 +110,7 @@ static void perf_hashmap(unsigned int method, unsigned int rounds)
 				hashmap_add(&map, &entries[i]->ent);
 			}
 
-			hashmap_free(&map);
+			hashmap_clear(&map);
 		}
 	} else {
 		/* test map lookups */
@@ -130,7 +130,7 @@ static void perf_hashmap(unsigned int method, unsigned int rounds)
 			}
 		}
 
-		hashmap_free(&map);
+		hashmap_clear(&map);
 	}
 }
 
@@ -262,6 +262,6 @@ int cmd__hashmap(int argc, const char **argv)
 	}
 
 	strbuf_release(&line);
-	hashmap_free_entries(&map, struct test_entry, ent);
+	hashmap_clear_and_free(&map, struct test_entry, ent);
 	return 0;
 }
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v3 06/13] strmap: new utility functions
  2020-11-02 18:55   ` [PATCH v3 00/13] Add struct strmap and associated utility functions Elijah Newren via GitGitGadget
                       ` (4 preceding siblings ...)
  2020-11-02 18:55     ` [PATCH v3 05/13] hashmap: provide deallocation function names Elijah Newren via GitGitGadget
@ 2020-11-02 18:55     ` Elijah Newren via GitGitGadget
  2020-11-02 18:55     ` [PATCH v3 07/13] strmap: add more " Elijah Newren via GitGitGadget
                       ` (8 subsequent siblings)
  14 siblings, 0 replies; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-02 18:55 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Add strmap as a new struct and associated utility functions,
specifically for hashmaps that map strings to some value.  The API is
taken directly from Peff's proposal at
https://lore.kernel.org/git/20180906191203.GA26184@sigill.intra.peff.net/

Note that similar string-list, I have a strdup_strings setting.
However, unlike string-list, strmap_init() does not take a parameter for
this setting and instead automatically sets it to 1; callers who want to
control this detail need to instead call strmap_init_with_options().
(Future patches will add additional parameters to
strmap_init_with_options()).

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 Makefile |  1 +
 strmap.c | 99 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 strmap.h | 65 +++++++++++++++++++++++++++++++++++++
 3 files changed, 165 insertions(+)
 create mode 100644 strmap.c
 create mode 100644 strmap.h

diff --git a/Makefile b/Makefile
index 95571ee3fc..777a34c01c 100644
--- a/Makefile
+++ b/Makefile
@@ -1000,6 +1000,7 @@ LIB_OBJS += stable-qsort.o
 LIB_OBJS += strbuf.o
 LIB_OBJS += streaming.o
 LIB_OBJS += string-list.o
+LIB_OBJS += strmap.o
 LIB_OBJS += strvec.o
 LIB_OBJS += sub-process.o
 LIB_OBJS += submodule-config.o
diff --git a/strmap.c b/strmap.c
new file mode 100644
index 0000000000..53f284eb20
--- /dev/null
+++ b/strmap.c
@@ -0,0 +1,99 @@
+#include "git-compat-util.h"
+#include "strmap.h"
+
+int cmp_strmap_entry(const void *hashmap_cmp_fn_data,
+		     const struct hashmap_entry *entry1,
+		     const struct hashmap_entry *entry2,
+		     const void *keydata)
+{
+	const struct strmap_entry *e1, *e2;
+
+	e1 = container_of(entry1, const struct strmap_entry, ent);
+	e2 = container_of(entry2, const struct strmap_entry, ent);
+	return strcmp(e1->key, e2->key);
+}
+
+static struct strmap_entry *find_strmap_entry(struct strmap *map,
+					      const char *str)
+{
+	struct strmap_entry entry;
+	hashmap_entry_init(&entry.ent, strhash(str));
+	entry.key = str;
+	return hashmap_get_entry(&map->map, &entry, ent, NULL);
+}
+
+void strmap_init(struct strmap *map)
+{
+	strmap_init_with_options(map, 1);
+}
+
+void strmap_init_with_options(struct strmap *map,
+			      int strdup_strings)
+{
+	hashmap_init(&map->map, cmp_strmap_entry, NULL, 0);
+	map->strdup_strings = strdup_strings;
+}
+
+static void strmap_free_entries_(struct strmap *map, int free_values)
+{
+	struct hashmap_iter iter;
+	struct strmap_entry *e;
+
+	if (!map)
+		return;
+
+	/*
+	 * We need to iterate over the hashmap entries and free
+	 * e->key and e->value ourselves; hashmap has no API to
+	 * take care of that for us.  Since we're already iterating over
+	 * the hashmap, though, might as well free e too and avoid the need
+	 * to make some call into the hashmap API to do that.
+	 */
+	hashmap_for_each_entry(&map->map, &iter, e, ent) {
+		if (free_values)
+			free(e->value);
+		if (map->strdup_strings)
+			free((char*)e->key);
+		free(e);
+	}
+}
+
+void strmap_clear(struct strmap *map, int free_values)
+{
+	strmap_free_entries_(map, free_values);
+	hashmap_clear(&map->map);
+}
+
+void *strmap_put(struct strmap *map, const char *str, void *data)
+{
+	struct strmap_entry *entry = find_strmap_entry(map, str);
+	void *old = NULL;
+
+	if (entry) {
+		old = entry->value;
+		entry->value = data;
+	} else {
+		const char *key = str;
+
+		entry = xmalloc(sizeof(*entry));
+		hashmap_entry_init(&entry->ent, strhash(str));
+
+		if (map->strdup_strings)
+			key = xstrdup(str);
+		entry->key = key;
+		entry->value = data;
+		hashmap_add(&map->map, &entry->ent);
+	}
+	return old;
+}
+
+void *strmap_get(struct strmap *map, const char *str)
+{
+	struct strmap_entry *entry = find_strmap_entry(map, str);
+	return entry ? entry->value : NULL;
+}
+
+int strmap_contains(struct strmap *map, const char *str)
+{
+	return find_strmap_entry(map, str) != NULL;
+}
diff --git a/strmap.h b/strmap.h
new file mode 100644
index 0000000000..96888c23ad
--- /dev/null
+++ b/strmap.h
@@ -0,0 +1,65 @@
+#ifndef STRMAP_H
+#define STRMAP_H
+
+#include "hashmap.h"
+
+struct strmap {
+	struct hashmap map;
+	unsigned int strdup_strings:1;
+};
+
+struct strmap_entry {
+	struct hashmap_entry ent;
+	const char *key;
+	void *value;
+};
+
+int cmp_strmap_entry(const void *hashmap_cmp_fn_data,
+		     const struct hashmap_entry *entry1,
+		     const struct hashmap_entry *entry2,
+		     const void *keydata);
+
+#define STRMAP_INIT { \
+			.map = HASHMAP_INIT(cmp_strmap_entry, NULL),  \
+			.strdup_strings = 1,                          \
+		    }
+
+/*
+ * Initialize the members of the strmap.  Any keys added to the strmap will
+ * be strdup'ed with their memory managed by the strmap.
+ */
+void strmap_init(struct strmap *map);
+
+/*
+ * Same as strmap_init, but for those who want to control the memory management
+ * carefully instead of using the default of strdup_strings=1.
+ */
+void strmap_init_with_options(struct strmap *map,
+			      int strdup_strings);
+
+/*
+ * Remove all entries from the map, releasing any allocated resources.
+ */
+void strmap_clear(struct strmap *map, int free_values);
+
+/*
+ * Insert "str" into the map, pointing to "data".
+ *
+ * If an entry for "str" already exists, its data pointer is overwritten, and
+ * the original data pointer returned. Otherwise, returns NULL.
+ */
+void *strmap_put(struct strmap *map, const char *str, void *data);
+
+/*
+ * Return the data pointer mapped by "str", or NULL if the entry does not
+ * exist.
+ */
+void *strmap_get(struct strmap *map, const char *str);
+
+/*
+ * Return non-zero iff "str" is present in the map. This differs from
+ * strmap_get() in that it can distinguish entries with a NULL data pointer.
+ */
+int strmap_contains(struct strmap *map, const char *str);
+
+#endif /* STRMAP_H */
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v3 07/13] strmap: add more utility functions
  2020-11-02 18:55   ` [PATCH v3 00/13] Add struct strmap and associated utility functions Elijah Newren via GitGitGadget
                       ` (5 preceding siblings ...)
  2020-11-02 18:55     ` [PATCH v3 06/13] strmap: new utility functions Elijah Newren via GitGitGadget
@ 2020-11-02 18:55     ` Elijah Newren via GitGitGadget
  2020-11-04 20:13       ` Jeff King
  2020-11-02 18:55     ` [PATCH v3 08/13] strmap: enable faster clearing and reusing of strmaps Elijah Newren via GitGitGadget
                       ` (7 subsequent siblings)
  14 siblings, 1 reply; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-02 18:55 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

This adds a number of additional convienence functions I want/need:
  * strmap_get_size()
  * strmap_empty()
  * strmap_remove()
  * strmap_for_each_entry()
  * strmap_get_entry()

I suspect the first four are self-explanatory.

strmap_get_entry() is similar to strmap_get() except that instead of just
returning the void* value that the string maps to, it returns the
strmap_entry that contains both the string and the void* value (or
NULL if the string isn't in the map).  This is helpful because it avoids
multiple lookups, e.g. in some cases a caller would need to call:
  * strmap_contains() to check that the map has an entry for the string
  * strmap_get() to get the void* value
  * <do some work to update the value>
  * strmap_put() to update/overwrite the value
If the void* pointer returned really is a pointer, then the last step is
unnecessary, but if the void* pointer is just cast to an integer then
strmap_put() will be needed.  In contrast, one can call strmap_get_entry()
and then:
  * check if the string was in the map by whether the pointer is NULL
  * access the value via entry->value
  * directly update entry->value
meaning that we can replace two or three hash table lookups with one.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 strmap.c | 20 ++++++++++++++++++++
 strmap.h | 36 ++++++++++++++++++++++++++++++++++++
 2 files changed, 56 insertions(+)

diff --git a/strmap.c b/strmap.c
index 53f284eb20..829f1bc095 100644
--- a/strmap.c
+++ b/strmap.c
@@ -87,6 +87,11 @@ void *strmap_put(struct strmap *map, const char *str, void *data)
 	return old;
 }
 
+struct strmap_entry *strmap_get_entry(struct strmap *map, const char *str)
+{
+	return find_strmap_entry(map, str);
+}
+
 void *strmap_get(struct strmap *map, const char *str)
 {
 	struct strmap_entry *entry = find_strmap_entry(map, str);
@@ -97,3 +102,18 @@ int strmap_contains(struct strmap *map, const char *str)
 {
 	return find_strmap_entry(map, str) != NULL;
 }
+
+void strmap_remove(struct strmap *map, const char *str, int free_value)
+{
+	struct strmap_entry entry, *ret;
+	hashmap_entry_init(&entry.ent, strhash(str));
+	entry.key = str;
+	ret = hashmap_remove_entry(&map->map, &entry, ent, NULL);
+	if (!ret)
+		return;
+	if (free_value)
+		free(ret->value);
+	if (map->strdup_strings)
+		free((char*)ret->key);
+	free(ret);
+}
diff --git a/strmap.h b/strmap.h
index 96888c23ad..ee4307cca5 100644
--- a/strmap.h
+++ b/strmap.h
@@ -50,6 +50,12 @@ void strmap_clear(struct strmap *map, int free_values);
  */
 void *strmap_put(struct strmap *map, const char *str, void *data);
 
+/*
+ * Return the strmap_entry mapped by "str", or NULL if there is not such
+ * an item in map.
+ */
+struct strmap_entry *strmap_get_entry(struct strmap *map, const char *str);
+
 /*
  * Return the data pointer mapped by "str", or NULL if the entry does not
  * exist.
@@ -62,4 +68,34 @@ void *strmap_get(struct strmap *map, const char *str);
  */
 int strmap_contains(struct strmap *map, const char *str);
 
+/*
+ * Remove the given entry from the strmap.  If the string isn't in the
+ * strmap, the map is not altered.
+ */
+void strmap_remove(struct strmap *map, const char *str, int free_value);
+
+/*
+ * Return how many entries the strmap has.
+ */
+static inline unsigned int strmap_get_size(struct strmap *map)
+{
+	return hashmap_get_size(&map->map);
+}
+
+/*
+ * Return whether the strmap is empty.
+ */
+static inline int strmap_empty(struct strmap *map)
+{
+	return strmap_get_size(map) == 0;
+}
+
+/*
+ * iterate through @map using @iter, @var is a pointer to a type strmap_entry
+ */
+#define strmap_for_each_entry(mystrmap, iter, var)	\
+	for (var = hashmap_iter_first_entry_offset(&(mystrmap)->map, iter, 0); \
+		var; \
+		var = hashmap_iter_next_entry_offset(iter, 0))
+
 #endif /* STRMAP_H */
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v3 08/13] strmap: enable faster clearing and reusing of strmaps
  2020-11-02 18:55   ` [PATCH v3 00/13] Add struct strmap and associated utility functions Elijah Newren via GitGitGadget
                       ` (6 preceding siblings ...)
  2020-11-02 18:55     ` [PATCH v3 07/13] strmap: add more " Elijah Newren via GitGitGadget
@ 2020-11-02 18:55     ` Elijah Newren via GitGitGadget
  2020-11-02 18:55     ` [PATCH v3 09/13] strmap: add functions facilitating use as a string->int map Elijah Newren via GitGitGadget
                       ` (6 subsequent siblings)
  14 siblings, 0 replies; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-02 18:55 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

When strmaps are used heavily, such as is done by my new merge-ort
algorithm, and strmaps need to be cleared but then re-used (because of
e.g. picking multiple commits to cherry-pick, or due to a recursive
merge having several different merges while recursing), free-ing and
reallocating map->table repeatedly can add up in time, especially since
it will likely be reallocated to a much smaller size but the previous
merge provides a good guide to the right size to use for the next merge.

Introduce strmap_partial_clear() to take advantage of this type of
situation; it will act similar to strmap_clear() except that
map->table's entries are zeroed instead of map->table being free'd.
Making use of this function reduced the cost of
clear_or_reinit_internal_opts() by about 20% in mert-ort, and dropped
the overall runtime of my rebase testcase by just under 2%.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 strmap.c | 6 ++++++
 strmap.h | 6 ++++++
 2 files changed, 12 insertions(+)

diff --git a/strmap.c b/strmap.c
index 829f1bc095..c410c5241a 100644
--- a/strmap.c
+++ b/strmap.c
@@ -64,6 +64,12 @@ void strmap_clear(struct strmap *map, int free_values)
 	hashmap_clear(&map->map);
 }
 
+void strmap_partial_clear(struct strmap *map, int free_values)
+{
+	strmap_free_entries_(map, free_values);
+	hashmap_partial_clear(&map->map);
+}
+
 void *strmap_put(struct strmap *map, const char *str, void *data)
 {
 	struct strmap_entry *entry = find_strmap_entry(map, str);
diff --git a/strmap.h b/strmap.h
index ee4307cca5..10b4642860 100644
--- a/strmap.h
+++ b/strmap.h
@@ -42,6 +42,12 @@ void strmap_init_with_options(struct strmap *map,
  */
 void strmap_clear(struct strmap *map, int free_values);
 
+/*
+ * Similar to strmap_clear() but leaves map->map->table allocated and
+ * pre-sized so that subsequent uses won't need as many rehashings.
+ */
+void strmap_partial_clear(struct strmap *map, int free_values);
+
 /*
  * Insert "str" into the map, pointing to "data".
  *
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v3 09/13] strmap: add functions facilitating use as a string->int map
  2020-11-02 18:55   ` [PATCH v3 00/13] Add struct strmap and associated utility functions Elijah Newren via GitGitGadget
                       ` (7 preceding siblings ...)
  2020-11-02 18:55     ` [PATCH v3 08/13] strmap: enable faster clearing and reusing of strmaps Elijah Newren via GitGitGadget
@ 2020-11-02 18:55     ` Elijah Newren via GitGitGadget
  2020-11-04 20:21       ` Jeff King
  2020-11-02 18:55     ` [PATCH v3 10/13] strmap: add a strset sub-type Elijah Newren via GitGitGadget
                       ` (5 subsequent siblings)
  14 siblings, 1 reply; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-02 18:55 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Although strmap could be used as a string->int map, one either had to
allocate an int for every entry and then deallocate later, or one had to
do a bunch of casting between (void*) and (intptr_t).

Add some special functions that do the casting.  Also, rename put->set
for such wrapper functions since 'put' implied there may be some
deallocation needed if the string was already found in the map, which
isn't the case when we're storing an int value directly in the void*
slot instead of using the void* slot as a pointer to data.

A note on the name: if anyone has a better name suggestion than
strintmap, I'm happy to take it.  It seems slightly unwieldy, but I have
not been able to come up with a better name.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 strmap.c | 11 +++++++
 strmap.h | 96 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 107 insertions(+)

diff --git a/strmap.c b/strmap.c
index c410c5241a..0d10a884b5 100644
--- a/strmap.c
+++ b/strmap.c
@@ -123,3 +123,14 @@ void strmap_remove(struct strmap *map, const char *str, int free_value)
 		free((char*)ret->key);
 	free(ret);
 }
+
+void strintmap_incr(struct strintmap *map, const char *str, intptr_t amt)
+{
+	struct strmap_entry *entry = find_strmap_entry(&map->map, str);
+	if (entry) {
+		intptr_t *whence = (intptr_t*)&entry->value;
+		*whence += amt;
+	}
+	else
+		strintmap_set(map, str, map->default_value + amt);
+}
diff --git a/strmap.h b/strmap.h
index 10b4642860..31474f781e 100644
--- a/strmap.h
+++ b/strmap.h
@@ -23,6 +23,11 @@ int cmp_strmap_entry(const void *hashmap_cmp_fn_data,
 			.map = HASHMAP_INIT(cmp_strmap_entry, NULL),  \
 			.strdup_strings = 1,                          \
 		    }
+#define STRINTMAP_INIT { \
+			.map.map = HASHMAP_INIT(cmp_strmap_entry, NULL),  \
+			.map.strdup_strings = 1,                          \
+			.default_value = 0,                               \
+		    }
 
 /*
  * Initialize the members of the strmap.  Any keys added to the strmap will
@@ -104,4 +109,95 @@ static inline int strmap_empty(struct strmap *map)
 		var; \
 		var = hashmap_iter_next_entry_offset(iter, 0))
 
+
+/*
+ * strintmap:
+ *    A map of string -> int, typecasting the void* of strmap to an int.
+ *
+ * Primary differences:
+ *    1) Since the void* value is just an int in disguise, there is no value
+ *       to free.  (Thus one fewer argument to strintmap_clear)
+ *    2) strintmap_get() returns an int; it also requires an extra parameter to
+ *       be specified so it knows what value to return if the underlying strmap
+ *       has not key matching the given string.
+ *    3) No strmap_put() equivalent; strintmap_set() and strintmap_incr()
+ *       instead.
+ */
+
+struct strintmap {
+	struct strmap map;
+	int default_value;
+};
+
+#define strintmap_for_each_entry(mystrmap, iter, var)	\
+	strmap_for_each_entry(&(mystrmap)->map, iter, var)
+
+static inline void strintmap_init(struct strintmap *map, int default_value)
+{
+	strmap_init(&map->map);
+	map->default_value = default_value;
+}
+
+static inline void strintmap_init_with_options(struct strintmap *map,
+					       int default_value,
+					       int strdup_strings)
+{
+	strmap_init_with_options(&map->map, strdup_strings);
+	map->default_value = default_value;
+}
+
+static inline void strintmap_clear(struct strintmap *map)
+{
+	strmap_clear(&map->map, 0);
+}
+
+static inline void strintmap_partial_clear(struct strintmap *map)
+{
+	strmap_partial_clear(&map->map, 0);
+}
+
+static inline int strintmap_contains(struct strintmap *map, const char *str)
+{
+	return strmap_contains(&map->map, str);
+}
+
+static inline void strintmap_remove(struct strintmap *map, const char *str)
+{
+	return strmap_remove(&map->map, str, 0);
+}
+
+static inline int strintmap_empty(struct strintmap *map)
+{
+	return strmap_empty(&map->map);
+}
+
+static inline unsigned int strintmap_get_size(struct strintmap *map)
+{
+	return strmap_get_size(&map->map);
+}
+
+/*
+ * Returns the value for str in the map.  If str isn't found in the map,
+ * the map's default_value is returned.
+ */
+static inline int strintmap_get(struct strintmap *map, const char *str)
+{
+	struct strmap_entry *result = strmap_get_entry(&map->map, str);
+	if (!result)
+		return map->default_value;
+	return (intptr_t)result->value;
+}
+
+static inline void strintmap_set(struct strintmap *map, const char *str,
+				 intptr_t v)
+{
+	strmap_put(&map->map, str, (void *)v);
+}
+
+/*
+ * Increment the value for str by amt.  If str isn't in the map, add it and
+ * set its value to default_value + amt.
+ */
+void strintmap_incr(struct strintmap *map, const char *str, intptr_t amt);
+
 #endif /* STRMAP_H */
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v3 10/13] strmap: add a strset sub-type
  2020-11-02 18:55   ` [PATCH v3 00/13] Add struct strmap and associated utility functions Elijah Newren via GitGitGadget
                       ` (8 preceding siblings ...)
  2020-11-02 18:55     ` [PATCH v3 09/13] strmap: add functions facilitating use as a string->int map Elijah Newren via GitGitGadget
@ 2020-11-02 18:55     ` Elijah Newren via GitGitGadget
  2020-11-04 20:31       ` Jeff King
  2020-11-02 18:55     ` [PATCH v3 11/13] strmap: enable allocations to come from a mem_pool Elijah Newren via GitGitGadget
                       ` (4 subsequent siblings)
  14 siblings, 1 reply; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-02 18:55 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Similar to adding strintmap for special-casing a string -> int mapping,
add a strset type for cases where we really are only interested in using
strmap for storing a set rather than a mapping.  In this case, we'll
always just store NULL for the value but the different struct type makes
it clearer than code comments how a variable is intended to be used.

The difference in usage also results in some differences in API: a few
things that aren't necessary or meaningful are dropped (namely, the
free_values argument to *_clear(), and the *_get() function), and
strset_add() is chosen as the API instead of strset_put().

Finally, shortlog already had a more minimal strset API; so this adds a
strset_check_and_add() function for its benefit to allow it to switch
over to this strset implementation.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 strmap.c |  8 +++++++
 strmap.h | 71 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 79 insertions(+)

diff --git a/strmap.c b/strmap.c
index 0d10a884b5..2aff985f40 100644
--- a/strmap.c
+++ b/strmap.c
@@ -134,3 +134,11 @@ void strintmap_incr(struct strintmap *map, const char *str, intptr_t amt)
 	else
 		strintmap_set(map, str, map->default_value + amt);
 }
+
+int strset_check_and_add(struct strset *set, const char *str)
+{
+	if (strset_contains(set, str))
+		return 1;
+	strset_add(set, str);
+	return 0;
+}
diff --git a/strmap.h b/strmap.h
index 31474f781e..fca1e9f639 100644
--- a/strmap.h
+++ b/strmap.h
@@ -28,6 +28,10 @@ int cmp_strmap_entry(const void *hashmap_cmp_fn_data,
 			.map.strdup_strings = 1,                          \
 			.default_value = 0,                               \
 		    }
+#define STRSET_INIT { \
+			.map.map = HASHMAP_INIT(cmp_strmap_entry, NULL),  \
+			.map.strdup_strings = 1,                          \
+		    }
 
 /*
  * Initialize the members of the strmap.  Any keys added to the strmap will
@@ -200,4 +204,71 @@ static inline void strintmap_set(struct strintmap *map, const char *str,
  */
 void strintmap_incr(struct strintmap *map, const char *str, intptr_t amt);
 
+/*
+ * strset:
+ *    A set of strings.
+ *
+ * Primary differences with strmap:
+ *    1) The value is always NULL, and ignored.  As there is no value to free,
+ *       there is one fewer argument to strset_clear
+ *    2) No strset_get() because there is no value.
+ *    3) No strset_put(); use strset_add() instead.
+ */
+
+struct strset {
+	struct strmap map;
+};
+
+#define strset_for_each_entry(mystrset, iter, var)	\
+	strmap_for_each_entry(&(mystrset)->map, iter, var)
+
+static inline void strset_init(struct strset *set)
+{
+	strmap_init(&set->map);
+}
+
+static inline void strset_init_with_options(struct strset *set,
+					    int strdup_strings)
+{
+	strmap_init_with_options(&set->map, strdup_strings);
+}
+
+static inline void strset_clear(struct strset *set)
+{
+	strmap_clear(&set->map, 0);
+}
+
+static inline void strset_partial_clear(struct strset *set)
+{
+	strmap_partial_clear(&set->map, 0);
+}
+
+static inline int strset_contains(struct strset *set, const char *str)
+{
+	return strmap_contains(&set->map, str);
+}
+
+static inline void strset_remove(struct strset *set, const char *str)
+{
+	return strmap_remove(&set->map, str, 0);
+}
+
+static inline int strset_empty(struct strset *set)
+{
+	return strmap_empty(&set->map);
+}
+
+static inline unsigned int strset_get_size(struct strset *set)
+{
+	return strmap_get_size(&set->map);
+}
+
+static inline void strset_add(struct strset *set, const char *str)
+{
+	strmap_put(&set->map, str, NULL);
+}
+
+/* Returns 1 if str already in set.  Otherwise adds str to set and returns 0 */
+int strset_check_and_add(struct strset *set, const char *str);
+
 #endif /* STRMAP_H */
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v3 11/13] strmap: enable allocations to come from a mem_pool
  2020-11-02 18:55   ` [PATCH v3 00/13] Add struct strmap and associated utility functions Elijah Newren via GitGitGadget
                       ` (9 preceding siblings ...)
  2020-11-02 18:55     ` [PATCH v3 10/13] strmap: add a strset sub-type Elijah Newren via GitGitGadget
@ 2020-11-02 18:55     ` Elijah Newren via GitGitGadget
  2020-11-02 18:55     ` [PATCH v3 12/13] strmap: take advantage of FLEXPTR_ALLOC_STR when relevant Elijah Newren via GitGitGadget
                       ` (3 subsequent siblings)
  14 siblings, 0 replies; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-02 18:55 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

For heavy users of strmaps, allowing the keys and entries to be
allocated from a memory pool can provide significant overhead savings.
Add an option to strmap_init_with_options() to specify a memory pool.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 strmap.c | 31 ++++++++++++++++++++++---------
 strmap.h | 11 ++++++++---
 2 files changed, 30 insertions(+), 12 deletions(-)

diff --git a/strmap.c b/strmap.c
index 2aff985f40..34bca92522 100644
--- a/strmap.c
+++ b/strmap.c
@@ -1,5 +1,6 @@
 #include "git-compat-util.h"
 #include "strmap.h"
+#include "mem-pool.h"
 
 int cmp_strmap_entry(const void *hashmap_cmp_fn_data,
 		     const struct hashmap_entry *entry1,
@@ -24,13 +25,15 @@ static struct strmap_entry *find_strmap_entry(struct strmap *map,
 
 void strmap_init(struct strmap *map)
 {
-	strmap_init_with_options(map, 1);
+	strmap_init_with_options(map, NULL, 1);
 }
 
 void strmap_init_with_options(struct strmap *map,
+			      struct mem_pool *pool,
 			      int strdup_strings)
 {
 	hashmap_init(&map->map, cmp_strmap_entry, NULL, 0);
+	map->pool = pool;
 	map->strdup_strings = strdup_strings;
 }
 
@@ -42,6 +45,10 @@ static void strmap_free_entries_(struct strmap *map, int free_values)
 	if (!map)
 		return;
 
+	if (!free_values && map->pool)
+		/* Memory other than util is owned by and freed with the pool */
+		return;
+
 	/*
 	 * We need to iterate over the hashmap entries and free
 	 * e->key and e->value ourselves; hashmap has no API to
@@ -52,9 +59,11 @@ static void strmap_free_entries_(struct strmap *map, int free_values)
 	hashmap_for_each_entry(&map->map, &iter, e, ent) {
 		if (free_values)
 			free(e->value);
-		if (map->strdup_strings)
-			free((char*)e->key);
-		free(e);
+		if (!map->pool) {
+			if (map->strdup_strings)
+				free((char*)e->key);
+			free(e);
+		}
 	}
 }
 
@@ -81,11 +90,13 @@ void *strmap_put(struct strmap *map, const char *str, void *data)
 	} else {
 		const char *key = str;
 
-		entry = xmalloc(sizeof(*entry));
+		entry = map->pool ? mem_pool_alloc(map->pool, sizeof(*entry))
+				  : xmalloc(sizeof(*entry));
 		hashmap_entry_init(&entry->ent, strhash(str));
 
 		if (map->strdup_strings)
-			key = xstrdup(str);
+			key = map->pool ? mem_pool_strdup(map->pool, str)
+					: xstrdup(str);
 		entry->key = key;
 		entry->value = data;
 		hashmap_add(&map->map, &entry->ent);
@@ -119,9 +130,11 @@ void strmap_remove(struct strmap *map, const char *str, int free_value)
 		return;
 	if (free_value)
 		free(ret->value);
-	if (map->strdup_strings)
-		free((char*)ret->key);
-	free(ret);
+	if (!map->pool) {
+		if (map->strdup_strings)
+			free((char*)ret->key);
+		free(ret);
+	}
 }
 
 void strintmap_incr(struct strintmap *map, const char *str, intptr_t amt)
diff --git a/strmap.h b/strmap.h
index fca1e9f639..6ffa6afb6a 100644
--- a/strmap.h
+++ b/strmap.h
@@ -3,8 +3,10 @@
 
 #include "hashmap.h"
 
+struct mempool;
 struct strmap {
 	struct hashmap map;
+	struct mem_pool *pool;
 	unsigned int strdup_strings:1;
 };
 
@@ -41,9 +43,10 @@ void strmap_init(struct strmap *map);
 
 /*
  * Same as strmap_init, but for those who want to control the memory management
- * carefully instead of using the default of strdup_strings=1.
+ * carefully instead of using the default of strdup_strings=1 and pool=NULL.
  */
 void strmap_init_with_options(struct strmap *map,
+			      struct mem_pool *pool,
 			      int strdup_strings);
 
 /*
@@ -144,9 +147,10 @@ static inline void strintmap_init(struct strintmap *map, int default_value)
 
 static inline void strintmap_init_with_options(struct strintmap *map,
 					       int default_value,
+					       struct mem_pool *pool,
 					       int strdup_strings)
 {
-	strmap_init_with_options(&map->map, strdup_strings);
+	strmap_init_with_options(&map->map, pool, strdup_strings);
 	map->default_value = default_value;
 }
 
@@ -228,9 +232,10 @@ static inline void strset_init(struct strset *set)
 }
 
 static inline void strset_init_with_options(struct strset *set,
+					    struct mem_pool *pool,
 					    int strdup_strings)
 {
-	strmap_init_with_options(&set->map, strdup_strings);
+	strmap_init_with_options(&set->map, pool, strdup_strings);
 }
 
 static inline void strset_clear(struct strset *set)
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v3 12/13] strmap: take advantage of FLEXPTR_ALLOC_STR when relevant
  2020-11-02 18:55   ` [PATCH v3 00/13] Add struct strmap and associated utility functions Elijah Newren via GitGitGadget
                       ` (10 preceding siblings ...)
  2020-11-02 18:55     ` [PATCH v3 11/13] strmap: enable allocations to come from a mem_pool Elijah Newren via GitGitGadget
@ 2020-11-02 18:55     ` Elijah Newren via GitGitGadget
  2020-11-04 20:43       ` Jeff King
  2020-11-02 18:55     ` [PATCH v3 13/13] Use new HASHMAP_INIT macro to simplify hashmap initialization Elijah Newren via GitGitGadget
                       ` (2 subsequent siblings)
  14 siblings, 1 reply; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-02 18:55 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

By default, we do not use a mempool and strdup_strings is true; in this
case, we can avoid both an extra allocation and an extra free by just
over-allocating for the strmap_entry leaving enough space at the end to
copy the key.  FLEXPTR_ALLOC_STR exists for exactly this purpose, so
make use of it.

Also, adjust the case when we are using a memory pool and strdup_strings
is true to just do one allocation from the memory pool instead of two so
that the strmap_clear() and strmap_remove() code can just avoid freeing
the key in all cases.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 strmap.c | 35 ++++++++++++++++++-----------------
 strmap.h |  1 +
 2 files changed, 19 insertions(+), 17 deletions(-)

diff --git a/strmap.c b/strmap.c
index 34bca92522..9abd47fd4b 100644
--- a/strmap.c
+++ b/strmap.c
@@ -59,11 +59,8 @@ static void strmap_free_entries_(struct strmap *map, int free_values)
 	hashmap_for_each_entry(&map->map, &iter, e, ent) {
 		if (free_values)
 			free(e->value);
-		if (!map->pool) {
-			if (map->strdup_strings)
-				free((char*)e->key);
+		if (!map->pool)
 			free(e);
-		}
 	}
 }
 
@@ -88,16 +85,23 @@ void *strmap_put(struct strmap *map, const char *str, void *data)
 		old = entry->value;
 		entry->value = data;
 	} else {
-		const char *key = str;
-
-		entry = map->pool ? mem_pool_alloc(map->pool, sizeof(*entry))
-				  : xmalloc(sizeof(*entry));
+		if (map->strdup_strings) {
+			if (!map->pool) {
+				FLEXPTR_ALLOC_STR(entry, key, str);
+			} else {
+				/* Remember +1 for nul byte twice below */
+				size_t len = strlen(str);
+				entry = mem_pool_alloc(map->pool,
+					       st_add3(sizeof(*entry), len, 1));
+				memcpy(entry->keydata, str, len+1);
+			}
+		} else if (!map->pool) {
+			entry = xmalloc(sizeof(*entry));
+		} else {
+			entry = mem_pool_alloc(map->pool, sizeof(*entry));
+		}
 		hashmap_entry_init(&entry->ent, strhash(str));
-
-		if (map->strdup_strings)
-			key = map->pool ? mem_pool_strdup(map->pool, str)
-					: xstrdup(str);
-		entry->key = key;
+		entry->key = map->strdup_strings ? entry->keydata : str;
 		entry->value = data;
 		hashmap_add(&map->map, &entry->ent);
 	}
@@ -130,11 +134,8 @@ void strmap_remove(struct strmap *map, const char *str, int free_value)
 		return;
 	if (free_value)
 		free(ret->value);
-	if (!map->pool) {
-		if (map->strdup_strings)
-			free((char*)ret->key);
+	if (!map->pool)
 		free(ret);
-	}
 }
 
 void strintmap_incr(struct strintmap *map, const char *str, intptr_t amt)
diff --git a/strmap.h b/strmap.h
index 6ffa6afb6a..0dd80b276e 100644
--- a/strmap.h
+++ b/strmap.h
@@ -14,6 +14,7 @@ struct strmap_entry {
 	struct hashmap_entry ent;
 	const char *key;
 	void *value;
+	char keydata[FLEX_ARRAY]; /* if strdup_strings=1, key == &keydata[0] */
 };
 
 int cmp_strmap_entry(const void *hashmap_cmp_fn_data,
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v3 13/13] Use new HASHMAP_INIT macro to simplify hashmap initialization
  2020-11-02 18:55   ` [PATCH v3 00/13] Add struct strmap and associated utility functions Elijah Newren via GitGitGadget
                       ` (11 preceding siblings ...)
  2020-11-02 18:55     ` [PATCH v3 12/13] strmap: take advantage of FLEXPTR_ALLOC_STR when relevant Elijah Newren via GitGitGadget
@ 2020-11-02 18:55     ` Elijah Newren via GitGitGadget
  2020-11-04 20:48       ` Jeff King
  2020-11-04 20:52     ` [PATCH v3 00/13] Add struct strmap and associated utility functions Jeff King
  2020-11-05  0:22     ` [PATCH v4 " Elijah Newren via GitGitGadget
  14 siblings, 1 reply; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-02 18:55 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Now that hashamp has lazy initialization and a HASHMAP_INIT macro,
hashmaps allocated on the stack can be initialized without a call to
hashmap_init() and in some cases makes the code a bit shorter.  Convert
some callsites over to take advantage of this.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 attr.c                  | 26 ++++++++------------------
 bloom.c                 |  3 +--
 builtin/difftool.c      |  9 ++++-----
 range-diff.c            |  4 +---
 revision.c              |  9 +--------
 t/helper/test-hashmap.c |  3 +--
 6 files changed, 16 insertions(+), 38 deletions(-)

diff --git a/attr.c b/attr.c
index a826b2ef1f..4ef85d668b 100644
--- a/attr.c
+++ b/attr.c
@@ -52,13 +52,6 @@ static inline void hashmap_unlock(struct attr_hashmap *map)
 	pthread_mutex_unlock(&map->mutex);
 }
 
-/*
- * The global dictionary of all interned attributes.  This
- * is a singleton object which is shared between threads.
- * Access to this dictionary must be surrounded with a mutex.
- */
-static struct attr_hashmap g_attr_hashmap;
-
 /* The container for objects stored in "struct attr_hashmap" */
 struct attr_hash_entry {
 	struct hashmap_entry ent;
@@ -80,11 +73,14 @@ static int attr_hash_entry_cmp(const void *unused_cmp_data,
 	return (a->keylen != b->keylen) || strncmp(a->key, b->key, a->keylen);
 }
 
-/* Initialize an 'attr_hashmap' object */
-static void attr_hashmap_init(struct attr_hashmap *map)
-{
-	hashmap_init(&map->map, attr_hash_entry_cmp, NULL, 0);
-}
+/*
+ * The global dictionary of all interned attributes.  This
+ * is a singleton object which is shared between threads.
+ * Access to this dictionary must be surrounded with a mutex.
+ */
+static struct attr_hashmap g_attr_hashmap = {
+	HASHMAP_INIT(attr_hash_entry_cmp, NULL)
+};
 
 /*
  * Retrieve the 'value' stored in a hashmap given the provided 'key'.
@@ -96,9 +92,6 @@ static void *attr_hashmap_get(struct attr_hashmap *map,
 	struct attr_hash_entry k;
 	struct attr_hash_entry *e;
 
-	if (!map->map.tablesize)
-		attr_hashmap_init(map);
-
 	hashmap_entry_init(&k.ent, memhash(key, keylen));
 	k.key = key;
 	k.keylen = keylen;
@@ -114,9 +107,6 @@ static void attr_hashmap_add(struct attr_hashmap *map,
 {
 	struct attr_hash_entry *e;
 
-	if (!map->map.tablesize)
-		attr_hashmap_init(map);
-
 	e = xmalloc(sizeof(struct attr_hash_entry));
 	hashmap_entry_init(&e->ent, memhash(key, keylen));
 	e->key = key;
diff --git a/bloom.c b/bloom.c
index 719c313a1c..b176f28f53 100644
--- a/bloom.c
+++ b/bloom.c
@@ -229,10 +229,9 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r,
 	diffcore_std(&diffopt);
 
 	if (diff_queued_diff.nr <= settings->max_changed_paths) {
-		struct hashmap pathmap;
+		struct hashmap pathmap = HASHMAP_INIT(pathmap_cmp, NULL);
 		struct pathmap_hash_entry *e;
 		struct hashmap_iter iter;
-		hashmap_init(&pathmap, pathmap_cmp, NULL, 0);
 
 		for (i = 0; i < diff_queued_diff.nr; i++) {
 			const char *path = diff_queued_diff.queue[i]->two->path;
diff --git a/builtin/difftool.c b/builtin/difftool.c
index 7ac432b881..6e18e623fd 100644
--- a/builtin/difftool.c
+++ b/builtin/difftool.c
@@ -342,7 +342,10 @@ static int run_dir_diff(const char *extcmd, int symlinks, const char *prefix,
 	const char *workdir, *tmp;
 	int ret = 0, i;
 	FILE *fp;
-	struct hashmap working_tree_dups, submodules, symlinks2;
+	struct hashmap working_tree_dups = HASHMAP_INIT(working_tree_entry_cmp,
+							NULL);
+	struct hashmap submodules = HASHMAP_INIT(pair_cmp, NULL);
+	struct hashmap symlinks2 = HASHMAP_INIT(pair_cmp, NULL);
 	struct hashmap_iter iter;
 	struct pair_entry *entry;
 	struct index_state wtindex;
@@ -383,10 +386,6 @@ static int run_dir_diff(const char *extcmd, int symlinks, const char *prefix,
 	rdir_len = rdir.len;
 	wtdir_len = wtdir.len;
 
-	hashmap_init(&working_tree_dups, working_tree_entry_cmp, NULL, 0);
-	hashmap_init(&submodules, pair_cmp, NULL, 0);
-	hashmap_init(&symlinks2, pair_cmp, NULL, 0);
-
 	child.no_stdin = 1;
 	child.git_cmd = 1;
 	child.use_shell = 0;
diff --git a/range-diff.c b/range-diff.c
index befeecae44..b9950f10c8 100644
--- a/range-diff.c
+++ b/range-diff.c
@@ -232,11 +232,9 @@ static int patch_util_cmp(const void *dummy, const struct patch_util *a,
 
 static void find_exact_matches(struct string_list *a, struct string_list *b)
 {
-	struct hashmap map;
+	struct hashmap map = HASHMAP_INIT((hashmap_cmp_fn)patch_util_cmp, NULL);
 	int i;
 
-	hashmap_init(&map, (hashmap_cmp_fn)patch_util_cmp, NULL, 0);
-
 	/* First, add the patches of a to a hash map */
 	for (i = 0; i < a->nr; i++) {
 		struct patch_util *util = a->items[i].util;
diff --git a/revision.c b/revision.c
index f27649d45d..c6e169e3eb 100644
--- a/revision.c
+++ b/revision.c
@@ -124,11 +124,6 @@ static int path_and_oids_cmp(const void *hashmap_cmp_fn_data,
 	return strcmp(e1->path, e2->path);
 }
 
-static void paths_and_oids_init(struct hashmap *map)
-{
-	hashmap_init(map, path_and_oids_cmp, NULL, 0);
-}
-
 static void paths_and_oids_clear(struct hashmap *map)
 {
 	struct hashmap_iter iter;
@@ -213,7 +208,7 @@ void mark_trees_uninteresting_sparse(struct repository *r,
 				     struct oidset *trees)
 {
 	unsigned has_interesting = 0, has_uninteresting = 0;
-	struct hashmap map;
+	struct hashmap map = HASHMAP_INIT(path_and_oids_cmp, NULL);
 	struct hashmap_iter map_iter;
 	struct path_and_oids_entry *entry;
 	struct object_id *oid;
@@ -237,8 +232,6 @@ void mark_trees_uninteresting_sparse(struct repository *r,
 	if (!has_uninteresting || !has_interesting)
 		return;
 
-	paths_and_oids_init(&map);
-
 	oidset_iter_init(trees, &iter);
 	while ((oid = oidset_iter_next(&iter))) {
 		struct tree *tree = lookup_tree(r, oid);
diff --git a/t/helper/test-hashmap.c b/t/helper/test-hashmap.c
index 2475663b49..36ff07bd4b 100644
--- a/t/helper/test-hashmap.c
+++ b/t/helper/test-hashmap.c
@@ -151,12 +151,11 @@ static void perf_hashmap(unsigned int method, unsigned int rounds)
 int cmd__hashmap(int argc, const char **argv)
 {
 	struct strbuf line = STRBUF_INIT;
-	struct hashmap map;
 	int icase;
+	struct hashmap map = HASHMAP_INIT(test_entry_cmp, &icase);
 
 	/* init hash map */
 	icase = argc > 1 && !strcmp("ignorecase", argv[1]);
-	hashmap_init(&map, test_entry_cmp, &icase, 0);
 
 	/* process commands from stdin */
 	while (strbuf_getline(&line, stdin) != EOF) {
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v2 03/10] hashmap: allow re-use after hashmap_free()
  2020-10-30 15:37       ` Elijah Newren
@ 2020-11-03 16:08         ` Jeff King
  2020-11-03 16:16           ` Elijah Newren
  0 siblings, 1 reply; 144+ messages in thread
From: Jeff King @ 2020-11-03 16:08 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Elijah Newren via GitGitGadget, Git Mailing List

On Fri, Oct 30, 2020 at 08:37:42AM -0700, Elijah Newren wrote:

> > This part I disagree with. If we did:
> >
> >   #define HASHMAP_INIT(fn, data) = { .cmpfn = cmpfn, cmpfn_data = data }
> >
> > then many callers could avoid handling the lazy-init themselves. E.g.:
> 
> Ah, gotcha.  That makes sense to me.  Given that 43 out of 47 callers
> of hashmap_init use cmpfn_data = NULL, should I shorten it to just one
> parameter for the macro, and let the four special cases keep calling
> hashmap_init() to specify a non-NULL cmpfn_data?

I'd be fine with it either way. I actually wrote it without the data
parameter at first, then changed my mine and added it in. ;)

You could also do:

  #define HASHMAP_INIT_DATA(fn, data) { .cmpfn = cmpfn, cmpfn_data = data }
  #define HASHMAP_INIT(fn) HASHMAP_INIT_DATA(fn, NULL)

if you want to keep most callers simple.

-Peff

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v2 04/10] hashmap: introduce a new hashmap_partial_clear()
  2020-10-30 16:03       ` Elijah Newren
@ 2020-11-03 16:10         ` Jeff King
  0 siblings, 0 replies; 144+ messages in thread
From: Jeff King @ 2020-11-03 16:10 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Elijah Newren via GitGitGadget, Git Mailing List

On Fri, Oct 30, 2020 at 09:03:38AM -0700, Elijah Newren wrote:

> > It would be nice if we had some actual perf numbers to report here, so
> > we could know exactly how much it was buying us. But I guess things are
> > a bit out-of-order there. You want to do this series first and then
> > build merge-ort on top as a user. We could introduce the basic data
> > structure first, then merge-ort, and then start applying optimizations
> > with real-world measurements. But I'm not sure it's worth the amount of
> > time you'd have to spend to reorganize in that way.
> 
> Yeah, the perf benefits didn't really come until I added a
> strmap_clear() based on this, so as you discovered I put perf numbers
> in patch 7 of this series.  Should I add a mention of the later commit
> message at this point in the series?

Nah, I think it's OK as it is. That kind of thing matters more for
reviewing than when you find the commit later on. And we're already
discussing it during the review.

-Peff

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v2 06/10] strmap: add more utility functions
  2020-10-30 16:43       ` Elijah Newren
@ 2020-11-03 16:12         ` Jeff King
  0 siblings, 0 replies; 144+ messages in thread
From: Jeff King @ 2020-11-03 16:12 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Elijah Newren via GitGitGadget, Git Mailing List

On Fri, Oct 30, 2020 at 09:43:33AM -0700, Elijah Newren wrote:

> > I suspect we need the same "var = NULL" that hashmap recently got in
> > 0ad621f61e (hashmap_for_each_entry(): workaround MSVC's runtime check
> > failure #3, 2020-09-30). Alternatively, I think you could drop
> > OFFSETOF_VAR completely in favor offsetof(struct strmap_entry, ent).
> >
> > In fact, since we know the correct type for "var", we _could_ declare it
> > ourselves in a new block enclosing the loop. But that is probably making
> > the code too magic; people reading the code would say "huh? where is
> > entry declared?".
> 
> Actually, since we know ent is the first entry in strmap, the offset
> is always 0.  So can't we just avoid OFFSETOF_VAR() and offsetof()
> entirely, by just using hashmap_iter_first() and hashmap_iter_next()?
> I'm going to try that.

Yes, I think that would work fine. You may want to add a comment to the
struct indicating that it's important for the hashmap_entry to be at the
front of the struct. Using offsetof() means that it's impossible to get
it wrong, though.

-Peff

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v2 03/10] hashmap: allow re-use after hashmap_free()
  2020-11-03 16:08         ` Jeff King
@ 2020-11-03 16:16           ` Elijah Newren
  0 siblings, 0 replies; 144+ messages in thread
From: Elijah Newren @ 2020-11-03 16:16 UTC (permalink / raw)
  To: Jeff King; +Cc: Elijah Newren via GitGitGadget, Git Mailing List

On Tue, Nov 3, 2020 at 8:08 AM Jeff King <peff@peff.net> wrote:
>
> On Fri, Oct 30, 2020 at 08:37:42AM -0700, Elijah Newren wrote:
>
> > > This part I disagree with. If we did:
> > >
> > >   #define HASHMAP_INIT(fn, data) = { .cmpfn = cmpfn, cmpfn_data = data }
> > >
> > > then many callers could avoid handling the lazy-init themselves. E.g.:
> >
> > Ah, gotcha.  That makes sense to me.  Given that 43 out of 47 callers
> > of hashmap_init use cmpfn_data = NULL, should I shorten it to just one
> > parameter for the macro, and let the four special cases keep calling
> > hashmap_init() to specify a non-NULL cmpfn_data?
>
> I'd be fine with it either way. I actually wrote it without the data
> parameter at first, then changed my mine and added it in. ;)
>
> You could also do:
>
>   #define HASHMAP_INIT_DATA(fn, data) { .cmpfn = cmpfn, cmpfn_data = data }
>   #define HASHMAP_INIT(fn) HASHMAP_INIT_DATA(fn, NULL)
>
> if you want to keep most callers simple.

I ended up going with your HASHMAP_INIT(fn, data) in v3 that I
submitted yesterday (except that you have a stray '=', are missing a
'.' in front of cmpfn_data, and you'll trigger BUG()s if you don't
also add .do_count_items = 1, but those are all minor fixups).  In the
future, if we determine we want/need the extra simplicity then we can
always convert to this newer suggestion.  I don't think it's that big
a deal either way.

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v2 08/10] strmap: add functions facilitating use as a string->int map
  2020-10-30 17:28       ` Elijah Newren
@ 2020-11-03 16:20         ` Jeff King
  2020-11-03 16:46           ` Elijah Newren
  0 siblings, 1 reply; 144+ messages in thread
From: Jeff King @ 2020-11-03 16:20 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Elijah Newren via GitGitGadget, Git Mailing List

On Fri, Oct 30, 2020 at 10:28:51AM -0700, Elijah Newren wrote:

> > You might want to mention that this _could_ be done as just accessors to
> > strmap, but using a separate struct provides type safety against
> > misusing pointers as integers or vice versa.
> 
> If I just did it as accessors, it makes it harder for myself and
> others to remember what my huge piles of strmaps in merge-ort do; I
> found that it became easier to follow the code and remember what
> things were doing when some were marked as strmap, some as strintmap,
> and some as strset.

Oh, I'm definitely on board with that argument. I was just suggesting
you might want to put it in the commit message for posterity.

> > > +/*
> > > + * strintmap:
> > > + *    A map of string -> int, typecasting the void* of strmap to an int.
> >
> > Are the size and signedness of an int flexible enough for all uses?
> 
> If some users want signed values and others want unsigned, I'm not
> sure how we can satisfy both.  Maybe make a struintmap?

Right, that was sort of my question: do your users actually want it
signed or not. Sounds like they do want it signed, and don't mind the
loss of range.

> Perhaps that could be added later if uses come up for it?  Some of my
> uses need int, the rest of them wouldn't care about int vs unsigned.

Yeah, if you don't have any callers which care, I'd definitely punt on
it for now.

> If someone does care about the full range of bits up to 64 on relevant
> platforms, I guess I should make it strintptr_t_map.

Yeah, that's what I was wondering. I suspect the use case for that is
pretty narrow, though. If you really care about having a 64-bit value
for some data, then you probably want it _everywhere_, not just on
64-bit platforms. I guess the exception would be if you're mapping into
size_t's or something.

I think my question was as much "did you think about range issues for
your intended users" as "should we provide more range in this map type".
And it sounds like you have thought about that, so I'm happy proceeding.

> But besides the
> egregiously ugly name, one advantage of int over intptr_t (or unsigned
> over uintptr_t) is that you can use it in a printf easily:
>    printf("Size: %d\n", strintmap_get(&foo, 0));
> whereas if it strintmap_get() returns an intptr_t, then it's a royal
> mess to attempt to portably use it without adding additional manual
> casts.  Maybe I was just missing something obvious, but I couldn't
> figure out the %d, %ld, %lld, PRIdMAX, etc. choices and get the
> statement to compile on all platforms, so I'd always just cast to int
> or unsigned at the time of calling printf.

The right way is:

  printf("Size: %"PRIdMAX", (intmax_t) your_intptr_t);

which will always do the right thing no matter the size (at the minor
cost of passing a larger-than-necessary parameter, but if you're
micro-optimizing then calling printf at all is probably already a
problem).

But yeah, in general using a real "int" is much more convenient and if
there's no reason to avoid it for range problems, I think it's
preferable.

-Peff

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v2 10/10] strmap: enable allocations to come from a mem_pool
  2020-10-30 19:31       ` Elijah Newren
@ 2020-11-03 16:24         ` Jeff King
  0 siblings, 0 replies; 144+ messages in thread
From: Jeff King @ 2020-11-03 16:24 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Elijah Newren via GitGitGadget, Git Mailing List

On Fri, Oct 30, 2020 at 12:31:13PM -0700, Elijah Newren wrote:

> > I think we could fall back to a FLEXPTR when there's no mempool (or even
> > when there is, though you'd be on your own to reimplement the
> > computation parts of FLEXPTR_ALLOC). I'm not sure how ugly it would end
> > up.
> 
> Yeah, we'd need a mempool-specific reimplementation of FLEXPTR_ALLOC
> with the mempool, and just avoid using it at all whenever
> strdup_strings was 0.  Seems slightly ugly, but maybe it wouldn't be
> too bad.  I could look into it.

It looks like you went this route (fall back to FLEXPTR) in the re-roll
you posted. I haven't looked at it carefully yet, but I suspect it will
be just fine to me (I probably would have accepted "no, it makes the
code too ugly; if you want efficiency use a mempool" as well, but I'll
see how ugly it turned out. ;) ).

> Anyway, at the time I
> put the mempool into strmaps and made use of it in relevant places,
> one of my rebase testcases saw an almost 5% reduction in overall
> execution time.  I'm sure it would have been over 5% if I had
> reordered it to come after my final rename optimization.

Thanks, it's nice to have a ballpark like that. It might be worth
putting it into the commit message, even if it's hand-wavy:

  This seemed to provide about 5% speedup for some rebase test cases I
  ran. Unfortunately you can't just time this commit and its parent,
  since we aren't yet actually using strmap in the code yet.

But again, I think the main value of that is during review, so if it
doesn't make it into the commit message, I'm OK.

-Peff

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v2 01/10] hashmap: add usage documentation explaining hashmap_free[_entries]()
  2020-10-30 19:55       ` Elijah Newren
@ 2020-11-03 16:26         ` Jeff King
  2020-11-03 16:48           ` Elijah Newren
  0 siblings, 1 reply; 144+ messages in thread
From: Jeff King @ 2020-11-03 16:26 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Elijah Newren via GitGitGadget, Git Mailing List

On Fri, Oct 30, 2020 at 12:55:51PM -0700, Elijah Newren wrote:

> > But I think in the current scheme that "free" is somewhat overloaded,
> > and if we end with a "clear" and a "free" that seems confusing to me.
> 
> Hmm...there are quite a few calls to hashmap_free() and
> hashmap_free_entries() throughout the codebase.  I'm wondering if I
> should make switching these over to your new naming suggestions a
> separate follow-on series from this one, so that if there are any
> conflicts with other series it doesn't need to hold these first 10
> patches up.

Yeah, it will definitely need a lot of mechanical fix-up. Those kinds of
conflicts aren't usually a big deal. Junio will have to resolve them,
but if the resolution is easy and mechanical, then it's not likely to
hold up either topic.

> If I do that, I could also add a patch to convert several callers of
> hashmap_init() to use the new HASHMAP_INIT() macro, and another patch
> to convert shortlog to using my strset instead of its own.

Yeah, both would be nice. I'm happy if it comes as part of the series,
or separately on top.

-Peff

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v2 08/10] strmap: add functions facilitating use as a string->int map
  2020-11-03 16:20         ` Jeff King
@ 2020-11-03 16:46           ` Elijah Newren
  0 siblings, 0 replies; 144+ messages in thread
From: Elijah Newren @ 2020-11-03 16:46 UTC (permalink / raw)
  To: Jeff King; +Cc: Elijah Newren via GitGitGadget, Git Mailing List

On Tue, Nov 3, 2020 at 8:20 AM Jeff King <peff@peff.net> wrote:
>
> On Fri, Oct 30, 2020 at 10:28:51AM -0700, Elijah Newren wrote:
>
> > > You might want to mention that this _could_ be done as just accessors to
> > > strmap, but using a separate struct provides type safety against
> > > misusing pointers as integers or vice versa.
> >
> > If I just did it as accessors, it makes it harder for myself and
> > others to remember what my huge piles of strmaps in merge-ort do; I
> > found that it became easier to follow the code and remember what
> > things were doing when some were marked as strmap, some as strintmap,
> > and some as strset.
>
> Oh, I'm definitely on board with that argument. I was just suggesting
> you might want to put it in the commit message for posterity.
>
> > > > +/*
> > > > + * strintmap:
> > > > + *    A map of string -> int, typecasting the void* of strmap to an int.
> > >
> > > Are the size and signedness of an int flexible enough for all uses?
> >
> > If some users want signed values and others want unsigned, I'm not
> > sure how we can satisfy both.  Maybe make a struintmap?
>
> Right, that was sort of my question: do your users actually want it
> signed or not. Sounds like they do want it signed, and don't mind the
> loss of range.
>
> > Perhaps that could be added later if uses come up for it?  Some of my
> > uses need int, the rest of them wouldn't care about int vs unsigned.
>
> Yeah, if you don't have any callers which care, I'd definitely punt on
> it for now.
>
> > If someone does care about the full range of bits up to 64 on relevant
> > platforms, I guess I should make it strintptr_t_map.
>
> Yeah, that's what I was wondering. I suspect the use case for that is
> pretty narrow, though. If you really care about having a 64-bit value
> for some data, then you probably want it _everywhere_, not just on
> 64-bit platforms. I guess the exception would be if you're mapping into
> size_t's or something.
>
> I think my question was as much "did you think about range issues for
> your intended users" as "should we provide more range in this map type".
> And it sounds like you have thought about that, so I'm happy proceeding.
>
> > But besides the
> > egregiously ugly name, one advantage of int over intptr_t (or unsigned
> > over uintptr_t) is that you can use it in a printf easily:
> >    printf("Size: %d\n", strintmap_get(&foo, 0));
> > whereas if it strintmap_get() returns an intptr_t, then it's a royal
> > mess to attempt to portably use it without adding additional manual
> > casts.  Maybe I was just missing something obvious, but I couldn't
> > figure out the %d, %ld, %lld, PRIdMAX, etc. choices and get the
> > statement to compile on all platforms, so I'd always just cast to int
> > or unsigned at the time of calling printf.
>
> The right way is:
>
>   printf("Size: %"PRIdMAX", (intmax_t) your_intptr_t);

Ah, intmax_t; that's what I was missing.

> which will always do the right thing no matter the size (at the minor
> cost of passing a larger-than-necessary parameter, but if you're
> micro-optimizing then calling printf at all is probably already a
> problem).
>
> But yeah, in general using a real "int" is much more convenient and if
> there's no reason to avoid it for range problems, I think it's
> preferable.

Yep, I like the simplicity of "int", the signedness of "int" and it
has far more than enough range on all platforms (most my strintmaps
actually map to enum values, but my largest int usage is for counting
up to at most how many files are involved in rename detection.  Even
microsoft repos only have a number of files present in the repository
that registers in the low millions, and I'm only dealing with the
subset of those files involved in rename detection, which should be
much smaller).

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v2 01/10] hashmap: add usage documentation explaining hashmap_free[_entries]()
  2020-11-03 16:26         ` Jeff King
@ 2020-11-03 16:48           ` Elijah Newren
  0 siblings, 0 replies; 144+ messages in thread
From: Elijah Newren @ 2020-11-03 16:48 UTC (permalink / raw)
  To: Jeff King; +Cc: Elijah Newren via GitGitGadget, Git Mailing List

On Tue, Nov 3, 2020 at 8:26 AM Jeff King <peff@peff.net> wrote:
>
> On Fri, Oct 30, 2020 at 12:55:51PM -0700, Elijah Newren wrote:
>
> > > But I think in the current scheme that "free" is somewhat overloaded,
> > > and if we end with a "clear" and a "free" that seems confusing to me.
> >
> > Hmm...there are quite a few calls to hashmap_free() and
> > hashmap_free_entries() throughout the codebase.  I'm wondering if I
> > should make switching these over to your new naming suggestions a
> > separate follow-on series from this one, so that if there are any
> > conflicts with other series it doesn't need to hold these first 10
> > patches up.
>
> Yeah, it will definitely need a lot of mechanical fix-up. Those kinds of
> conflicts aren't usually a big deal. Junio will have to resolve them,
> but if the resolution is easy and mechanical, then it's not likely to
> hold up either topic.
>
> > If I do that, I could also add a patch to convert several callers of
> > hashmap_init() to use the new HASHMAP_INIT() macro, and another patch
> > to convert shortlog to using my strset instead of its own.
>
> Yeah, both would be nice. I'm happy if it comes as part of the series,
> or separately on top.

After sending the email, I ended up deciding to convert the callers
just to sanity check the HASHMAP_INIT macro and discovered that the
code will BUG() if you don't also include .do_count_items = 1.  So, I
just decided to include that in the v3 of the series after all.

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v3 07/13] strmap: add more utility functions
  2020-11-02 18:55     ` [PATCH v3 07/13] strmap: add more " Elijah Newren via GitGitGadget
@ 2020-11-04 20:13       ` Jeff King
  2020-11-04 20:24         ` Elijah Newren
  0 siblings, 1 reply; 144+ messages in thread
From: Jeff King @ 2020-11-04 20:13 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget; +Cc: git, Elijah Newren

On Mon, Nov 02, 2020 at 06:55:07PM +0000, Elijah Newren via GitGitGadget wrote:

> +/*
> + * iterate through @map using @iter, @var is a pointer to a type strmap_entry
> + */
> +#define strmap_for_each_entry(mystrmap, iter, var)	\
> +	for (var = hashmap_iter_first_entry_offset(&(mystrmap)->map, iter, 0); \
> +		var; \
> +		var = hashmap_iter_next_entry_offset(iter, 0))
> +

I think this resolves my offset question from the last round. But I
wonder if you tried:

  #define strmap_for_each_entry(mystrmap, iter, var) \
	hashmap_for_each_entry(&(mystrmap)->map, iter, var, ent)

which is a bit more abstract and should function the same (I think; I
didn't try it).

-Peff

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v3 09/13] strmap: add functions facilitating use as a string->int map
  2020-11-02 18:55     ` [PATCH v3 09/13] strmap: add functions facilitating use as a string->int map Elijah Newren via GitGitGadget
@ 2020-11-04 20:21       ` Jeff King
  0 siblings, 0 replies; 144+ messages in thread
From: Jeff King @ 2020-11-04 20:21 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget; +Cc: git, Elijah Newren

On Mon, Nov 02, 2020 at 06:55:09PM +0000, Elijah Newren via GitGitGadget wrote:

> From: Elijah Newren <newren@gmail.com>
> 
> Although strmap could be used as a string->int map, one either had to
> allocate an int for every entry and then deallocate later, or one had to
> do a bunch of casting between (void*) and (intptr_t).
> 
> Add some special functions that do the casting.  Also, rename put->set
> for such wrapper functions since 'put' implied there may be some
> deallocation needed if the string was already found in the map, which
> isn't the case when we're storing an int value directly in the void*
> slot instead of using the void* slot as a pointer to data.
> 
> A note on the name: if anyone has a better name suggestion than
> strintmap, I'm happy to take it.  It seems slightly unwieldy, but I have
> not been able to come up with a better name.

You can probably drop this last paragraph. It's good for review, but
probably not in the commit message. :)

> +void strintmap_incr(struct strintmap *map, const char *str, intptr_t amt)
> +{
> +	struct strmap_entry *entry = find_strmap_entry(&map->map, str);
> +	if (entry) {
> +		intptr_t *whence = (intptr_t*)&entry->value;
> +		*whence += amt;
> +	}
> +	else
> +		strintmap_set(map, str, map->default_value + amt);
> +}

Here we use the new default_value. Neat.

> diff --git a/strmap.h b/strmap.h
> index 10b4642860..31474f781e 100644
> --- a/strmap.h
> +++ b/strmap.h
> @@ -23,6 +23,11 @@ int cmp_strmap_entry(const void *hashmap_cmp_fn_data,
>  			.map = HASHMAP_INIT(cmp_strmap_entry, NULL),  \
>  			.strdup_strings = 1,                          \
>  		    }
> +#define STRINTMAP_INIT { \
> +			.map.map = HASHMAP_INIT(cmp_strmap_entry, NULL),  \
> +			.map.strdup_strings = 1,                          \
> +			.default_value = 0,                               \
> +		    }

Re-using STRMAP_INIT would shorten this (and avoid repeating internal
details of how strmap works). Like:

  #define STRINTMAP_INIT { \
	.map = STRMAP_INIT, \
	.default_value = 0, \
  }

You can also omit default_value, as the value of any un-mentioned
elements will get the usual C zero-initialization. So:

  #define STRINTMAP_INIT { .map = STRMAP_INIT }

would be sufficient (though I don't mind making the .default_value part
explicit). It could also be a parameter to the macro, but I suspect it
would be rarely used. I don't mind leaving it as something that advanced
callers can get from using strintmap_init().

> +/*
> + * strintmap:
> + *    A map of string -> int, typecasting the void* of strmap to an int.
> + *
> + * Primary differences:
> + *    1) Since the void* value is just an int in disguise, there is no value
> + *       to free.  (Thus one fewer argument to strintmap_clear)
> + *    2) strintmap_get() returns an int; it also requires an extra parameter to
> + *       be specified so it knows what value to return if the underlying strmap
> + *       has not key matching the given string.
> + *    3) No strmap_put() equivalent; strintmap_set() and strintmap_incr()
> + *       instead.
> + */

I think (2) here is out-of-date, as we now use map->default_value.

> +/*
> + * Returns the value for str in the map.  If str isn't found in the map,
> + * the map's default_value is returned.
> + */
> +static inline int strintmap_get(struct strintmap *map, const char *str)
> +{
> +	struct strmap_entry *result = strmap_get_entry(&map->map, str);
> +	if (!result)
> +		return map->default_value;
> +	return (intptr_t)result->value;
> +}

And we get to reuse default_value here again. Nice.

-Peff

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v3 07/13] strmap: add more utility functions
  2020-11-04 20:13       ` Jeff King
@ 2020-11-04 20:24         ` Elijah Newren
  0 siblings, 0 replies; 144+ messages in thread
From: Elijah Newren @ 2020-11-04 20:24 UTC (permalink / raw)
  To: Jeff King; +Cc: Elijah Newren via GitGitGadget, Git Mailing List

On Wed, Nov 4, 2020 at 12:13 PM Jeff King <peff@peff.net> wrote:
>
> On Mon, Nov 02, 2020 at 06:55:07PM +0000, Elijah Newren via GitGitGadget wrote:
>
> > +/*
> > + * iterate through @map using @iter, @var is a pointer to a type strmap_entry
> > + */
> > +#define strmap_for_each_entry(mystrmap, iter, var)   \
> > +     for (var = hashmap_iter_first_entry_offset(&(mystrmap)->map, iter, 0); \
> > +             var; \
> > +             var = hashmap_iter_next_entry_offset(iter, 0))
> > +
>
> I think this resolves my offset question from the last round. But I
> wonder if you tried:
>
>   #define strmap_for_each_entry(mystrmap, iter, var) \
>         hashmap_for_each_entry(&(mystrmap)->map, iter, var, ent)
>
> which is a bit more abstract and should function the same (I think; I
> didn't try it).

I tried another variant or two besides what I used here, but not the
one you suggest.  Your suggestion seems obvious and nicer now that you
point it out.

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v3 10/13] strmap: add a strset sub-type
  2020-11-02 18:55     ` [PATCH v3 10/13] strmap: add a strset sub-type Elijah Newren via GitGitGadget
@ 2020-11-04 20:31       ` Jeff King
  0 siblings, 0 replies; 144+ messages in thread
From: Jeff King @ 2020-11-04 20:31 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget; +Cc: git, Elijah Newren

On Mon, Nov 02, 2020 at 06:55:10PM +0000, Elijah Newren via GitGitGadget wrote:

> +int strset_check_and_add(struct strset *set, const char *str)
> +{
> +	if (strset_contains(set, str))
> +		return 1;
> +	strset_add(set, str);
> +	return 0;
> +}

With this implementation, I wonder if it is worth having such a
specialized function. The value of an atomic check-and-add operation is
that it can reuse the effort to hash the string for both operations (it
could also reuse any open-table probing effort, but for a chained hash
like our implementation, it's cheap to add a new entry to the front of
the list).

I doubt it matters all that much for the use case in shortlog. Perhaps
we should just open-code it there for now, and we can revisit it if
another user comes up.

> --- a/strmap.h
> +++ b/strmap.h
> @@ -28,6 +28,10 @@ int cmp_strmap_entry(const void *hashmap_cmp_fn_data,
>  			.map.strdup_strings = 1,                          \
>  			.default_value = 0,                               \
>  		    }
> +#define STRSET_INIT { \
> +			.map.map = HASHMAP_INIT(cmp_strmap_entry, NULL),  \
> +			.map.strdup_strings = 1,                          \
> +		    }

As with strint, this could be:

  #define STRSET_INIT { .map = STRMAP_INIT }

-Peff

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v3 12/13] strmap: take advantage of FLEXPTR_ALLOC_STR when relevant
  2020-11-02 18:55     ` [PATCH v3 12/13] strmap: take advantage of FLEXPTR_ALLOC_STR when relevant Elijah Newren via GitGitGadget
@ 2020-11-04 20:43       ` Jeff King
  0 siblings, 0 replies; 144+ messages in thread
From: Jeff King @ 2020-11-04 20:43 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget; +Cc: git, Elijah Newren

On Mon, Nov 02, 2020 at 06:55:12PM +0000, Elijah Newren via GitGitGadget wrote:

> From: Elijah Newren <newren@gmail.com>
> 
> By default, we do not use a mempool and strdup_strings is true; in this
> case, we can avoid both an extra allocation and an extra free by just
> over-allocating for the strmap_entry leaving enough space at the end to
> copy the key.  FLEXPTR_ALLOC_STR exists for exactly this purpose, so
> make use of it.
> 
> Also, adjust the case when we are using a memory pool and strdup_strings
> is true to just do one allocation from the memory pool instead of two so
> that the strmap_clear() and strmap_remove() code can just avoid freeing
> the key in all cases.

This turned out to be much less painful than I feared, and I think is
worth doing. Thanks for digging on it.

> +		if (map->strdup_strings) {
> +			if (!map->pool) {
> +				FLEXPTR_ALLOC_STR(entry, key, str);
> +			} else {
> +				/* Remember +1 for nul byte twice below */
> +				size_t len = strlen(str);
> +				entry = mem_pool_alloc(map->pool,
> +					       st_add3(sizeof(*entry), len, 1));
> +				memcpy(entry->keydata, str, len+1);
> +			}

Perhaps:

  size_t len = st_add(strlen(str), 1); /* include NUL */
  entry = mem_pool_alloc(map->pool, st_add(sizeof(*entry), len));
  memcpy(entry->keydata, str, len);

would be more obvious than the "remember to do it twice" comment?

With a FLEXPTR, I don't think you need keydata at all (since we would
never use that name; note that we don't even pass it in at all to
FLEXPTR_ALLOC_STR). Without that, I think your memcpy becomes:

  memcpy(entry + 1, str, len);

Remember that "entry" is a typed pointer, so "1" is really moving
sizeof(*entry) bytes.

> +		} else if (!map->pool) {
> +			entry = xmalloc(sizeof(*entry));
> +		} else {
> +			entry = mem_pool_alloc(map->pool, sizeof(*entry));
> +		}

OK, so if we're not strdup-ing then we either get a mempool or a fresh
entry. Makes sense.

>  		hashmap_entry_init(&entry->ent, strhash(str));
> -
> -		if (map->strdup_strings)
> -			key = map->pool ? mem_pool_strdup(map->pool, str)
> -					: xstrdup(str);
> -		entry->key = key;
> +		entry->key = map->strdup_strings ? entry->keydata : str;

I think this is subtly wrong in the FLEXPTR case. The data isn't in
keydata; it's directly after the struct. That's _usually_ the same
thing, but:

  - the compiler can put struct padding at the end if it wants

  - FLEX_ARRAY is usually zero, but for compatibility on some platforms
    it must be 1

The call to FLEXPTR_ALLOC_STR() will have already set it up properly
(and this is at best writing the same value, and at worst messing it
up).

I think you probably want to leave the FLEXPTR_ALLOC_STR() part alone,
put a:

  entry->key = (void *)(entry + 1);

line in the mem_pool code path, and then here do:

  if (!strdup_strings)
	entry->key = str;

-Peff

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v3 13/13] Use new HASHMAP_INIT macro to simplify hashmap initialization
  2020-11-02 18:55     ` [PATCH v3 13/13] Use new HASHMAP_INIT macro to simplify hashmap initialization Elijah Newren via GitGitGadget
@ 2020-11-04 20:48       ` Jeff King
  0 siblings, 0 replies; 144+ messages in thread
From: Jeff King @ 2020-11-04 20:48 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget; +Cc: git, Elijah Newren

On Mon, Nov 02, 2020 at 06:55:13PM +0000, Elijah Newren via GitGitGadget wrote:

> From: Elijah Newren <newren@gmail.com>
> 
> Now that hashamp has lazy initialization and a HASHMAP_INIT macro,
> hashmaps allocated on the stack can be initialized without a call to
> hashmap_init() and in some cases makes the code a bit shorter.  Convert
> some callsites over to take advantage of this.

These all look obviously correct. I suspect there are more, but there's
no need for us to be thorough at this point.

> --- a/range-diff.c
> +++ b/range-diff.c
> @@ -232,11 +232,9 @@ static int patch_util_cmp(const void *dummy, const struct patch_util *a,
>  
>  static void find_exact_matches(struct string_list *a, struct string_list *b)
>  {
> -	struct hashmap map;
> +	struct hashmap map = HASHMAP_INIT((hashmap_cmp_fn)patch_util_cmp, NULL);

Not related to your patch, but we should in general try to define these
comparison functions with the correct signature, rather than cast them.

I think I've already inflated your series enough, so let's leave it for
now, but just a mental note that it might be useful to circle back and
fix these en masse (I know we cleaned up several a while back, so I'm
actually surprised to see one still here).

-Peff

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v3 00/13] Add struct strmap and associated utility functions
  2020-11-02 18:55   ` [PATCH v3 00/13] Add struct strmap and associated utility functions Elijah Newren via GitGitGadget
                       ` (12 preceding siblings ...)
  2020-11-02 18:55     ` [PATCH v3 13/13] Use new HASHMAP_INIT macro to simplify hashmap initialization Elijah Newren via GitGitGadget
@ 2020-11-04 20:52     ` Jeff King
  2020-11-04 22:20       ` Elijah Newren
  2020-11-05  0:22     ` [PATCH v4 " Elijah Newren via GitGitGadget
  14 siblings, 1 reply; 144+ messages in thread
From: Jeff King @ 2020-11-04 20:52 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget; +Cc: git, Elijah Newren

On Mon, Nov 02, 2020 at 06:55:00PM +0000, Elijah Newren via GitGitGadget wrote:

> Changes since v2 (almost all of which were suggestions from Peff):

Thanks again for your work on this series, and your willingness to
listen to my various suggestions. ;)

This mostly looks good to me. I pointed out a few minor nits in reply to
individual patches, but there's at least one correctness problem, so
we'll need a v4.

> Things that I'm still unsure about:
> 
>  * strintmap_init() takes a default_value parameter, as suggested by Peff.
>    But this makes the function name strintmap_init_with_options() weird,
>    because strintmap_init() already takes one option, so it seems like the
>    name needs to replace "options" with "more_options". But that's kinda
>    ugly too. I'm guessing strintmap_init_with_options() is fine as-is, but
>    I'm wondering if anyone else thinks it looks weird and if so if there is
>    anything I should do about it.

You could drop default_value from strintmap_init_with_options(). I'd
_guess_ most callers would be happy with 0, but you'd know much better
than I what your first crop of callers will want.

I'm happy with it either way.

> Things Peff mentioned on v2 that I did NOT do:
> 
>  * Peff brought up some questions about mapping strintmap to an int rather
>    than an unsigned or intptr_t. I discussed my rationale in the thread

Yeah, I'm well convinced that what you have here is fine.

> Things Peff mentioned on v1 that are still not included and which Peff
> didn't comment on for v2, but which may still be worth mentioning again:
> 
>  * Peff brought up the idea of having a free_values member instead of having
>    a free_values parameter to strmap_clear(). That'd just mean moving the
>    parameter from strmap_clear() to strmap_init() and would be easy to do,
>    but he sounded like he was just throwing it out as an idea and I didn't
>    have a strong opinion, so I left it as-is. If others have
>    opinions/preferences, changing it is easy right now.

Yeah, I was mostly thinking out loud. What you have here looks fine to
me.

>  * Peff early on wanted the strmap_entry type to have a char key[FLEX_ALLOC]
>    instead of having a (const) char *key. I spent a couple more days on this
>    despite him not mentioning it while reviewing v2, and finally got it
>    working this time and running valgrind-free. Note that such a change
>    means always copying the key instead of allowing it as an option. After
>    implementing it, I timed it and it slowed down my important testcase by
>    just over 6%. So I chucked it. I think the FLEXPTR_ALLOC_STR usage in
>    combination with defaulting to strdup_strings=1 gives us most the
>    benefits Peff wanted, while still allowing merge-ort to reuse strings
>    when it's important.

Yes, I'd agree that FLEXPTR is a good middle ground. If I really manage
to find a caller later where I think the complexity might be worth
saving a few bytes, perhaps I'll try it then and get some real
measurements. My guess is that won't ever actually happen. :)

-Peff

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v3 00/13] Add struct strmap and associated utility functions
  2020-11-04 20:52     ` [PATCH v3 00/13] Add struct strmap and associated utility functions Jeff King
@ 2020-11-04 22:20       ` Elijah Newren
  0 siblings, 0 replies; 144+ messages in thread
From: Elijah Newren @ 2020-11-04 22:20 UTC (permalink / raw)
  To: Jeff King; +Cc: Elijah Newren via GitGitGadget, Git Mailing List

On Wed, Nov 4, 2020 at 12:52 PM Jeff King <peff@peff.net> wrote:
>
> On Mon, Nov 02, 2020 at 06:55:00PM +0000, Elijah Newren via GitGitGadget wrote:
>
> > Changes since v2 (almost all of which were suggestions from Peff):
>
> Thanks again for your work on this series, and your willingness to
> listen to my various suggestions. ;)
>
> This mostly looks good to me. I pointed out a few minor nits in reply to
> individual patches, but there's at least one correctness problem, so
> we'll need a v4.

Cool, thanks for the careful reviews; I'll fix it up and send a v4 out.

> > Things that I'm still unsure about:
> >
> >  * strintmap_init() takes a default_value parameter, as suggested by Peff.
> >    But this makes the function name strintmap_init_with_options() weird,
> >    because strintmap_init() already takes one option, so it seems like the
> >    name needs to replace "options" with "more_options". But that's kinda
> >    ugly too. I'm guessing strintmap_init_with_options() is fine as-is, but
> >    I'm wondering if anyone else thinks it looks weird and if so if there is
> >    anything I should do about it.
>
> You could drop default_value from strintmap_init_with_options(). I'd
> _guess_ most callers would be happy with 0, but you'd know much better
> than I what your first crop of callers will want.
>
> I'm happy with it either way.
>
> > Things Peff mentioned on v2 that I did NOT do:
> >
> >  * Peff brought up some questions about mapping strintmap to an int rather
> >    than an unsigned or intptr_t. I discussed my rationale in the thread
>
> Yeah, I'm well convinced that what you have here is fine.
>
> > Things Peff mentioned on v1 that are still not included and which Peff
> > didn't comment on for v2, but which may still be worth mentioning again:
> >
> >  * Peff brought up the idea of having a free_values member instead of having
> >    a free_values parameter to strmap_clear(). That'd just mean moving the
> >    parameter from strmap_clear() to strmap_init() and would be easy to do,
> >    but he sounded like he was just throwing it out as an idea and I didn't
> >    have a strong opinion, so I left it as-is. If others have
> >    opinions/preferences, changing it is easy right now.
>
> Yeah, I was mostly thinking out loud. What you have here looks fine to
> me.
>
> >  * Peff early on wanted the strmap_entry type to have a char key[FLEX_ALLOC]
> >    instead of having a (const) char *key. I spent a couple more days on this
> >    despite him not mentioning it while reviewing v2, and finally got it
> >    working this time and running valgrind-free. Note that such a change
> >    means always copying the key instead of allowing it as an option. After
> >    implementing it, I timed it and it slowed down my important testcase by
> >    just over 6%. So I chucked it. I think the FLEXPTR_ALLOC_STR usage in
> >    combination with defaulting to strdup_strings=1 gives us most the
> >    benefits Peff wanted, while still allowing merge-ort to reuse strings
> >    when it's important.
>
> Yes, I'd agree that FLEXPTR is a good middle ground. If I really manage
> to find a caller later where I think the complexity might be worth
> saving a few bytes, perhaps I'll try it then and get some real
> measurements. My guess is that won't ever actually happen. :)
>
> -Peff

^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v4 00/13] Add struct strmap and associated utility functions
  2020-11-02 18:55   ` [PATCH v3 00/13] Add struct strmap and associated utility functions Elijah Newren via GitGitGadget
                       ` (13 preceding siblings ...)
  2020-11-04 20:52     ` [PATCH v3 00/13] Add struct strmap and associated utility functions Jeff King
@ 2020-11-05  0:22     ` Elijah Newren via GitGitGadget
  2020-11-05  0:22       ` [PATCH v4 01/13] hashmap: add usage documentation explaining hashmap_free[_entries]() Elijah Newren via GitGitGadget
                         ` (14 more replies)
  14 siblings, 15 replies; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-05  0:22 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren

Here I introduce a new strmap type (and strintmap and strset), which my new
merge backed, merge-ort, uses heavily. (I also made significant use of it in
my changes to diffcore-rename). This strmap type was based on Peff's
proposal from a couple years ago[1], but has additions that I made as I used
it, and a number of additions/changes suggested by Peff in his reviews. I
also start the series off with some changes to hashmap, based on Peff's
feedback on v1 & v2.

NOTE: While en/merge-ort-impl depends on this series, there are no changes
in v4 that affect it so en/merge-ort-impl does not need a reroll.

Changes since v3 (almost all of which were suggestions from Peff):

 * Fix pointer math due to platform differences in FLEX_ALLOC definition,
   and a few other FLEXPTR_ALLOC_STR cleanups
 * Define strmap_for_each_entry in terms of hashmap_for_each_entry instead
   of lower level functions
 * Use simpler _INIT macros
 * Remove strset_check_and_add() from API as per Peff's suggestion
   (merge-ort doesn't need it; we can add it later)
 * Update comments and commit messages to update now obsolete statements due
   to changes from earlier reviews

[1] 
https://lore.kernel.org/git/20180906191203.GA26184@sigill.intra.peff.net/

Elijah Newren (13):
  hashmap: add usage documentation explaining hashmap_free[_entries]()
  hashmap: adjust spacing to fix argument alignment
  hashmap: allow re-use after hashmap_free()
  hashmap: introduce a new hashmap_partial_clear()
  hashmap: provide deallocation function names
  strmap: new utility functions
  strmap: add more utility functions
  strmap: enable faster clearing and reusing of strmaps
  strmap: add functions facilitating use as a string->int map
  strmap: add a strset sub-type
  strmap: enable allocations to come from a mem_pool
  strmap: take advantage of FLEXPTR_ALLOC_STR when relevant
  Use new HASHMAP_INIT macro to simplify hashmap initialization

 Makefile                |   1 +
 add-interactive.c       |   2 +-
 attr.c                  |  26 ++--
 blame.c                 |   2 +-
 bloom.c                 |   5 +-
 builtin/difftool.c      |   9 +-
 builtin/fetch.c         |   6 +-
 builtin/shortlog.c      |   2 +-
 config.c                |   2 +-
 diff.c                  |   4 +-
 diffcore-rename.c       |   2 +-
 dir.c                   |   8 +-
 hashmap.c               |  74 +++++++----
 hashmap.h               |  91 +++++++++++---
 merge-recursive.c       |   6 +-
 name-hash.c             |   4 +-
 object.c                |   2 +-
 oidmap.c                |   2 +-
 patch-ids.c             |   2 +-
 range-diff.c            |   6 +-
 ref-filter.c            |   2 +-
 revision.c              |  11 +-
 sequencer.c             |   4 +-
 strmap.c                | 151 ++++++++++++++++++++++
 strmap.h                | 270 ++++++++++++++++++++++++++++++++++++++++
 submodule-config.c      |   4 +-
 t/helper/test-hashmap.c |   9 +-
 27 files changed, 593 insertions(+), 114 deletions(-)
 create mode 100644 strmap.c
 create mode 100644 strmap.h


base-commit: d4a392452e292ff924e79ec8458611c0f679d6d4
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-835%2Fnewren%2Fstrmap-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-835/newren/strmap-v4
Pull-Request: https://github.com/git/git/pull/835

Range-diff vs v3:

  1:  af6b6fcb46 =  1:  af6b6fcb46 hashmap: add usage documentation explaining hashmap_free[_entries]()
  2:  591161fd78 =  2:  591161fd78 hashmap: adjust spacing to fix argument alignment
  3:  f2718d036d =  3:  f2718d036d hashmap: allow re-use after hashmap_free()
  4:  61f1da3c51 =  4:  61f1da3c51 hashmap: introduce a new hashmap_partial_clear()
  5:  861e8d65ae =  5:  861e8d65ae hashmap: provide deallocation function names
  6:  448d3b219f =  6:  448d3b219f strmap: new utility functions
  7:  42633b8d03 !  7:  5e8004c728 strmap: add more utility functions
     @@ strmap.h: void *strmap_get(struct strmap *map, const char *str);
      + * iterate through @map using @iter, @var is a pointer to a type strmap_entry
      + */
      +#define strmap_for_each_entry(mystrmap, iter, var)	\
     -+	for (var = hashmap_iter_first_entry_offset(&(mystrmap)->map, iter, 0); \
     -+		var; \
     -+		var = hashmap_iter_next_entry_offset(iter, 0))
     ++	hashmap_for_each_entry(&(mystrmap)->map, iter, var, ent)
      +
       #endif /* STRMAP_H */
  8:  ea942eb803 =  8:  fd96e9fc8d strmap: enable faster clearing and reusing of strmaps
  9:  c1d2172171 !  9:  f499934f54 strmap: add functions facilitating use as a string->int map
     @@ Commit message
          isn't the case when we're storing an int value directly in the void*
          slot instead of using the void* slot as a pointer to data.
      
     -    A note on the name: if anyone has a better name suggestion than
     -    strintmap, I'm happy to take it.  It seems slightly unwieldy, but I have
     -    not been able to come up with a better name.
     -
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
       ## strmap.c ##
     @@ strmap.h: int cmp_strmap_entry(const void *hashmap_cmp_fn_data,
       			.strdup_strings = 1,                          \
       		    }
      +#define STRINTMAP_INIT { \
     -+			.map.map = HASHMAP_INIT(cmp_strmap_entry, NULL),  \
     -+			.map.strdup_strings = 1,                          \
     -+			.default_value = 0,                               \
     -+		    }
     ++			.map = STRMAP_INIT,   \
     ++			.default_value = 0,   \
     ++		       }
       
       /*
        * Initialize the members of the strmap.  Any keys added to the strmap will
      @@ strmap.h: static inline int strmap_empty(struct strmap *map)
     - 		var; \
     - 		var = hashmap_iter_next_entry_offset(iter, 0))
     + #define strmap_for_each_entry(mystrmap, iter, var)	\
     + 	hashmap_for_each_entry(&(mystrmap)->map, iter, var, ent)
       
      +
      +/*
     @@ strmap.h: static inline int strmap_empty(struct strmap *map)
      + * Primary differences:
      + *    1) Since the void* value is just an int in disguise, there is no value
      + *       to free.  (Thus one fewer argument to strintmap_clear)
     -+ *    2) strintmap_get() returns an int; it also requires an extra parameter to
     -+ *       be specified so it knows what value to return if the underlying strmap
     -+ *       has not key matching the given string.
     ++ *    2) strintmap_get() returns an int, or returns the default_value if the
     ++ *       key is not found in the strintmap.
      + *    3) No strmap_put() equivalent; strintmap_set() and strintmap_incr()
      + *       instead.
      + */
 10:  0f57735f5e ! 10:  ee1ec55f1b strmap: add a strset sub-type
     @@ Commit message
          free_values argument to *_clear(), and the *_get() function), and
          strset_add() is chosen as the API instead of strset_put().
      
     -    Finally, shortlog already had a more minimal strset API; so this adds a
     -    strset_check_and_add() function for its benefit to allow it to switch
     -    over to this strset implementation.
     -
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
     - ## strmap.c ##
     -@@ strmap.c: void strintmap_incr(struct strintmap *map, const char *str, intptr_t amt)
     - 	else
     - 		strintmap_set(map, str, map->default_value + amt);
     - }
     -+
     -+int strset_check_and_add(struct strset *set, const char *str)
     -+{
     -+	if (strset_contains(set, str))
     -+		return 1;
     -+	strset_add(set, str);
     -+	return 0;
     -+}
     -
       ## strmap.h ##
      @@ strmap.h: int cmp_strmap_entry(const void *hashmap_cmp_fn_data,
     - 			.map.strdup_strings = 1,                          \
     - 			.default_value = 0,                               \
     - 		    }
     -+#define STRSET_INIT { \
     -+			.map.map = HASHMAP_INIT(cmp_strmap_entry, NULL),  \
     -+			.map.strdup_strings = 1,                          \
     -+		    }
     + 			.map = STRMAP_INIT,   \
     + 			.default_value = 0,   \
     + 		       }
     ++#define STRSET_INIT { .map = STRMAP_INIT }
       
       /*
        * Initialize the members of the strmap.  Any keys added to the strmap will
     @@ strmap.h: static inline void strintmap_set(struct strintmap *map, const char *st
      +{
      +	strmap_put(&set->map, str, NULL);
      +}
     -+
     -+/* Returns 1 if str already in set.  Otherwise adds str to set and returns 0 */
     -+int strset_check_and_add(struct strset *set, const char *str);
      +
       #endif /* STRMAP_H */
 11:  980537e877 = 11:  73a57045c3 strmap: enable allocations to come from a mem_pool
 12:  7f93cbb525 ! 12:  0352260de4 strmap: take advantage of FLEXPTR_ALLOC_STR when relevant
     @@ strmap.c: static void strmap_free_entries_(struct strmap *map, int free_values)
       	}
       }
       
     -@@ strmap.c: void *strmap_put(struct strmap *map, const char *str, void *data)
     - 		old = entry->value;
     +@@ strmap.c: void strmap_partial_clear(struct strmap *map, int free_values)
     + void *strmap_put(struct strmap *map, const char *str, void *data)
     + {
     + 	struct strmap_entry *entry = find_strmap_entry(map, str);
     +-	void *old = NULL;
     + 
     + 	if (entry) {
     +-		old = entry->value;
     ++		void *old = entry->value;
       		entry->value = data;
     - 	} else {
     +-	} else {
      -		const char *key = str;
      -
      -		entry = map->pool ? mem_pool_alloc(map->pool, sizeof(*entry))
      -				  : xmalloc(sizeof(*entry));
     -+		if (map->strdup_strings) {
     -+			if (!map->pool) {
     -+				FLEXPTR_ALLOC_STR(entry, key, str);
     -+			} else {
     -+				/* Remember +1 for nul byte twice below */
     -+				size_t len = strlen(str);
     -+				entry = mem_pool_alloc(map->pool,
     -+					       st_add3(sizeof(*entry), len, 1));
     -+				memcpy(entry->keydata, str, len+1);
     -+			}
     -+		} else if (!map->pool) {
     -+			entry = xmalloc(sizeof(*entry));
     -+		} else {
     -+			entry = mem_pool_alloc(map->pool, sizeof(*entry));
     -+		}
     - 		hashmap_entry_init(&entry->ent, strhash(str));
     --
     +-		hashmap_entry_init(&entry->ent, strhash(str));
     ++		return old;
     ++	}
     + 
      -		if (map->strdup_strings)
      -			key = map->pool ? mem_pool_strdup(map->pool, str)
      -					: xstrdup(str);
      -		entry->key = key;
     -+		entry->key = map->strdup_strings ? entry->keydata : str;
     - 		entry->value = data;
     - 		hashmap_add(&map->map, &entry->ent);
     +-		entry->value = data;
     +-		hashmap_add(&map->map, &entry->ent);
     ++	if (map->strdup_strings) {
     ++		if (!map->pool) {
     ++			FLEXPTR_ALLOC_STR(entry, key, str);
     ++		} else {
     ++			size_t len = st_add(strlen(str), 1); /* include NUL */
     ++			entry = mem_pool_alloc(map->pool,
     ++					       st_add(sizeof(*entry), len));
     ++			memcpy(entry + 1, str, len);
     ++			entry->key = (void *)(entry + 1);
     ++		}
     ++	} else if (!map->pool) {
     ++		entry = xmalloc(sizeof(*entry));
     ++	} else {
     ++		entry = mem_pool_alloc(map->pool, sizeof(*entry));
       	}
     +-	return old;
     ++	hashmap_entry_init(&entry->ent, strhash(str));
     ++	if (!map->strdup_strings)
     ++		entry->key = str;
     ++	entry->value = data;
     ++	hashmap_add(&map->map, &entry->ent);
     ++	return NULL;
     + }
     + 
     + struct strmap_entry *strmap_get_entry(struct strmap *map, const char *str)
      @@ strmap.c: void strmap_remove(struct strmap *map, const char *str, int free_value)
       		return;
       	if (free_value)
     @@ strmap.h: struct strmap_entry {
       	struct hashmap_entry ent;
       	const char *key;
       	void *value;
     -+	char keydata[FLEX_ARRAY]; /* if strdup_strings=1, key == &keydata[0] */
     ++	/* strmap_entry may be allocated extra space to store the key at end */
       };
       
       int cmp_strmap_entry(const void *hashmap_cmp_fn_data,
 13:  5f41fc63e5 = 13:  617926540b Use new HASHMAP_INIT macro to simplify hashmap initialization

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v4 01/13] hashmap: add usage documentation explaining hashmap_free[_entries]()
  2020-11-05  0:22     ` [PATCH v4 " Elijah Newren via GitGitGadget
@ 2020-11-05  0:22       ` Elijah Newren via GitGitGadget
  2020-11-05  0:22       ` [PATCH v4 02/13] hashmap: adjust spacing to fix argument alignment Elijah Newren via GitGitGadget
                         ` (13 subsequent siblings)
  14 siblings, 0 replies; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-05  0:22 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

The existence of hashmap_free() and hashmap_free_entries() confused me,
and the docs weren't clear enough.  We are dealing with a map table,
entries in that table, and possibly also things each of those entries
point to.  I had to consult other source code examples and the
implementation.  Add a brief note to clarify the differences.  This will
become even more important once we introduce a new
hashmap_partial_clear() function which will add the question of whether
the table itself has been freed.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 hashmap.h | 31 +++++++++++++++++++++++++++++--
 1 file changed, 29 insertions(+), 2 deletions(-)

diff --git a/hashmap.h b/hashmap.h
index b011b394fe..2994dc7a9c 100644
--- a/hashmap.h
+++ b/hashmap.h
@@ -236,13 +236,40 @@ void hashmap_init(struct hashmap *map,
 void hashmap_free_(struct hashmap *map, ssize_t offset);
 
 /*
- * Frees a hashmap structure and allocated memory, leaves entries undisturbed
+ * Frees a hashmap structure and allocated memory for the table, but does not
+ * free the entries nor anything they point to.
+ *
+ * Usage note:
+ *
+ * Many callers will need to iterate over all entries and free the data each
+ * entry points to; in such a case, they can free the entry itself while at it.
+ * Thus, you might see:
+ *
+ *    hashmap_for_each_entry(map, hashmap_iter, e, hashmap_entry_name) {
+ *      free(e->somefield);
+ *      free(e);
+ *    }
+ *    hashmap_free(map);
+ *
+ * instead of
+ *
+ *    hashmap_for_each_entry(map, hashmap_iter, e, hashmap_entry_name) {
+ *      free(e->somefield);
+ *    }
+ *    hashmap_free_entries(map, struct my_entry_struct, hashmap_entry_name);
+ *
+ * to avoid the implicit extra loop over the entries.  However, if there are
+ * no special fields in your entry that need to be freed beyond the entry
+ * itself, it is probably simpler to avoid the explicit loop and just call
+ * hashmap_free_entries().
  */
 #define hashmap_free(map) hashmap_free_(map, -1)
 
 /*
  * Frees @map and all entries.  @type is the struct type of the entry
- * where @member is the hashmap_entry struct used to associate with @map
+ * where @member is the hashmap_entry struct used to associate with @map.
+ *
+ * See usage note above hashmap_free().
  */
 #define hashmap_free_entries(map, type, member) \
 	hashmap_free_(map, offsetof(type, member));
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v4 02/13] hashmap: adjust spacing to fix argument alignment
  2020-11-05  0:22     ` [PATCH v4 " Elijah Newren via GitGitGadget
  2020-11-05  0:22       ` [PATCH v4 01/13] hashmap: add usage documentation explaining hashmap_free[_entries]() Elijah Newren via GitGitGadget
@ 2020-11-05  0:22       ` Elijah Newren via GitGitGadget
  2020-11-05  0:22       ` [PATCH v4 03/13] hashmap: allow re-use after hashmap_free() Elijah Newren via GitGitGadget
                         ` (12 subsequent siblings)
  14 siblings, 0 replies; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-05  0:22 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

No actual code changes; just whitespace adjustments.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 hashmap.c | 17 +++++++++--------
 hashmap.h | 22 +++++++++++-----------
 2 files changed, 20 insertions(+), 19 deletions(-)

diff --git a/hashmap.c b/hashmap.c
index 09813e1a46..e44d8a3e85 100644
--- a/hashmap.c
+++ b/hashmap.c
@@ -92,8 +92,9 @@ static void alloc_table(struct hashmap *map, unsigned int size)
 }
 
 static inline int entry_equals(const struct hashmap *map,
-		const struct hashmap_entry *e1, const struct hashmap_entry *e2,
-		const void *keydata)
+			       const struct hashmap_entry *e1,
+			       const struct hashmap_entry *e2,
+			       const void *keydata)
 {
 	return (e1 == e2) ||
 	       (e1->hash == e2->hash &&
@@ -101,7 +102,7 @@ static inline int entry_equals(const struct hashmap *map,
 }
 
 static inline unsigned int bucket(const struct hashmap *map,
-		const struct hashmap_entry *key)
+				  const struct hashmap_entry *key)
 {
 	return key->hash & (map->tablesize - 1);
 }
@@ -148,7 +149,7 @@ static int always_equal(const void *unused_cmp_data,
 }
 
 void hashmap_init(struct hashmap *map, hashmap_cmp_fn equals_function,
-		const void *cmpfn_data, size_t initial_size)
+		  const void *cmpfn_data, size_t initial_size)
 {
 	unsigned int size = HASHMAP_INITIAL_SIZE;
 
@@ -199,7 +200,7 @@ struct hashmap_entry *hashmap_get(const struct hashmap *map,
 }
 
 struct hashmap_entry *hashmap_get_next(const struct hashmap *map,
-			const struct hashmap_entry *entry)
+				       const struct hashmap_entry *entry)
 {
 	struct hashmap_entry *e = entry->next;
 	for (; e; e = e->next)
@@ -225,8 +226,8 @@ void hashmap_add(struct hashmap *map, struct hashmap_entry *entry)
 }
 
 struct hashmap_entry *hashmap_remove(struct hashmap *map,
-					const struct hashmap_entry *key,
-					const void *keydata)
+				     const struct hashmap_entry *key,
+				     const void *keydata)
 {
 	struct hashmap_entry *old;
 	struct hashmap_entry **e = find_entry_ptr(map, key, keydata);
@@ -249,7 +250,7 @@ struct hashmap_entry *hashmap_remove(struct hashmap *map,
 }
 
 struct hashmap_entry *hashmap_put(struct hashmap *map,
-				struct hashmap_entry *entry)
+				  struct hashmap_entry *entry)
 {
 	struct hashmap_entry *old = hashmap_remove(map, entry, NULL);
 	hashmap_add(map, entry);
diff --git a/hashmap.h b/hashmap.h
index 2994dc7a9c..904f61d6e1 100644
--- a/hashmap.h
+++ b/hashmap.h
@@ -228,9 +228,9 @@ struct hashmap {
  * prevent expensive resizing. If 0, the table is dynamically resized.
  */
 void hashmap_init(struct hashmap *map,
-			 hashmap_cmp_fn equals_function,
-			 const void *equals_function_data,
-			 size_t initial_size);
+		  hashmap_cmp_fn equals_function,
+		  const void *equals_function_data,
+		  size_t initial_size);
 
 /* internal function for freeing hashmap */
 void hashmap_free_(struct hashmap *map, ssize_t offset);
@@ -288,7 +288,7 @@ void hashmap_free_(struct hashmap *map, ssize_t offset);
  * and if it is on stack, you can just let it go out of scope).
  */
 static inline void hashmap_entry_init(struct hashmap_entry *e,
-					unsigned int hash)
+				      unsigned int hash)
 {
 	e->hash = hash;
 	e->next = NULL;
@@ -330,8 +330,8 @@ static inline unsigned int hashmap_get_size(struct hashmap *map)
  * to `hashmap_cmp_fn` to decide whether the entry matches the key.
  */
 struct hashmap_entry *hashmap_get(const struct hashmap *map,
-				const struct hashmap_entry *key,
-				const void *keydata);
+				  const struct hashmap_entry *key,
+				  const void *keydata);
 
 /*
  * Returns the hashmap entry for the specified hash code and key data,
@@ -364,7 +364,7 @@ static inline struct hashmap_entry *hashmap_get_from_hash(
  * call to `hashmap_get` or `hashmap_get_next`.
  */
 struct hashmap_entry *hashmap_get_next(const struct hashmap *map,
-			const struct hashmap_entry *entry);
+				       const struct hashmap_entry *entry);
 
 /*
  * Adds a hashmap entry. This allows to add duplicate entries (i.e.
@@ -384,7 +384,7 @@ void hashmap_add(struct hashmap *map, struct hashmap_entry *entry);
  * Returns the replaced entry, or NULL if not found (i.e. the entry was added).
  */
 struct hashmap_entry *hashmap_put(struct hashmap *map,
-				struct hashmap_entry *entry);
+				  struct hashmap_entry *entry);
 
 /*
  * Adds or replaces a hashmap entry contained within @keyvar,
@@ -406,8 +406,8 @@ struct hashmap_entry *hashmap_put(struct hashmap *map,
  * Argument explanation is the same as in `hashmap_get`.
  */
 struct hashmap_entry *hashmap_remove(struct hashmap *map,
-					const struct hashmap_entry *key,
-					const void *keydata);
+				     const struct hashmap_entry *key,
+				     const void *keydata);
 
 /*
  * Removes a hashmap entry contained within @keyvar,
@@ -449,7 +449,7 @@ struct hashmap_entry *hashmap_iter_next(struct hashmap_iter *iter);
 
 /* Initializes the iterator and returns the first entry, if any. */
 static inline struct hashmap_entry *hashmap_iter_first(struct hashmap *map,
-		struct hashmap_iter *iter)
+						       struct hashmap_iter *iter)
 {
 	hashmap_iter_init(map, iter);
 	return hashmap_iter_next(iter);
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v4 03/13] hashmap: allow re-use after hashmap_free()
  2020-11-05  0:22     ` [PATCH v4 " Elijah Newren via GitGitGadget
  2020-11-05  0:22       ` [PATCH v4 01/13] hashmap: add usage documentation explaining hashmap_free[_entries]() Elijah Newren via GitGitGadget
  2020-11-05  0:22       ` [PATCH v4 02/13] hashmap: adjust spacing to fix argument alignment Elijah Newren via GitGitGadget
@ 2020-11-05  0:22       ` Elijah Newren via GitGitGadget
  2020-11-05  0:22       ` [PATCH v4 04/13] hashmap: introduce a new hashmap_partial_clear() Elijah Newren via GitGitGadget
                         ` (11 subsequent siblings)
  14 siblings, 0 replies; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-05  0:22 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Previously, once map->table had been freed, any calls to hashmap_put(),
hashmap_get(), or hashmap_remove() would cause a NULL pointer
dereference (since hashmap_free_() also zeros the memory; without that
zeroing, calling these functions would cause a use-after-free problem).

Modify these functions to check for a NULL table and automatically
allocate as needed.

Also add a HASHMAP_INIT(fn, data) macro for initializing hashmaps on the
stack without calling hashmap_init().

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 hashmap.c | 16 ++++++++++++++--
 hashmap.h |  3 +++
 2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/hashmap.c b/hashmap.c
index e44d8a3e85..bb7c9979b8 100644
--- a/hashmap.c
+++ b/hashmap.c
@@ -114,6 +114,7 @@ int hashmap_bucket(const struct hashmap *map, unsigned int hash)
 
 static void rehash(struct hashmap *map, unsigned int newsize)
 {
+	/* map->table MUST NOT be NULL when this function is called */
 	unsigned int i, oldsize = map->tablesize;
 	struct hashmap_entry **oldtable = map->table;
 
@@ -134,6 +135,7 @@ static void rehash(struct hashmap *map, unsigned int newsize)
 static inline struct hashmap_entry **find_entry_ptr(const struct hashmap *map,
 		const struct hashmap_entry *key, const void *keydata)
 {
+	/* map->table MUST NOT be NULL when this function is called */
 	struct hashmap_entry **e = &map->table[bucket(map, key)];
 	while (*e && !entry_equals(map, *e, key, keydata))
 		e = &(*e)->next;
@@ -196,6 +198,8 @@ struct hashmap_entry *hashmap_get(const struct hashmap *map,
 				const struct hashmap_entry *key,
 				const void *keydata)
 {
+	if (!map->table)
+		return NULL;
 	return *find_entry_ptr(map, key, keydata);
 }
 
@@ -211,8 +215,12 @@ struct hashmap_entry *hashmap_get_next(const struct hashmap *map,
 
 void hashmap_add(struct hashmap *map, struct hashmap_entry *entry)
 {
-	unsigned int b = bucket(map, entry);
+	unsigned int b;
+
+	if (!map->table)
+		alloc_table(map, HASHMAP_INITIAL_SIZE);
 
+	b = bucket(map, entry);
 	/* add entry */
 	entry->next = map->table[b];
 	map->table[b] = entry;
@@ -230,7 +238,11 @@ struct hashmap_entry *hashmap_remove(struct hashmap *map,
 				     const void *keydata)
 {
 	struct hashmap_entry *old;
-	struct hashmap_entry **e = find_entry_ptr(map, key, keydata);
+	struct hashmap_entry **e;
+
+	if (!map->table)
+		return NULL;
+	e = find_entry_ptr(map, key, keydata);
 	if (!*e)
 		return NULL;
 
diff --git a/hashmap.h b/hashmap.h
index 904f61d6e1..3b0f2bcade 100644
--- a/hashmap.h
+++ b/hashmap.h
@@ -210,6 +210,9 @@ struct hashmap {
 
 /* hashmap functions */
 
+#define HASHMAP_INIT(fn, data) { .cmpfn = fn, .cmpfn_data = data, \
+				 .do_count_items = 1 }
+
 /*
  * Initializes a hashmap structure.
  *
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v4 04/13] hashmap: introduce a new hashmap_partial_clear()
  2020-11-05  0:22     ` [PATCH v4 " Elijah Newren via GitGitGadget
                         ` (2 preceding siblings ...)
  2020-11-05  0:22       ` [PATCH v4 03/13] hashmap: allow re-use after hashmap_free() Elijah Newren via GitGitGadget
@ 2020-11-05  0:22       ` Elijah Newren via GitGitGadget
  2020-11-05  0:22       ` [PATCH v4 05/13] hashmap: provide deallocation function names Elijah Newren via GitGitGadget
                         ` (10 subsequent siblings)
  14 siblings, 0 replies; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-05  0:22 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

merge-ort is a heavy user of strmaps, which are built on hashmap.[ch].
clear_or_reinit_internal_opts() in merge-ort was taking about 12% of
overall runtime in my testcase involving rebasing 35 patches of
linux.git across a big rename.  clear_or_reinit_internal_opts() was
calling hashmap_free() followed by hashmap_init(), meaning that not only
was it freeing all the memory associated with each of the strmaps just
to immediately allocate a new array again, it was allocating a new array
that was likely smaller than needed (thus resulting in later need to
rehash things).  The ending size of the map table on the previous commit
was likely almost perfectly sized for the next commit we wanted to pick,
and not dropping and reallocating the table immediately is a win.

Add some new API to hashmap to clear a hashmap of entries without
freeing map->table (and instead only zeroing it out like alloc_table()
would do, along with zeroing the count of items in the table and the
shrink_at field).

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 hashmap.c | 39 +++++++++++++++++++++++++++------------
 hashmap.h | 13 ++++++++++++-
 2 files changed, 39 insertions(+), 13 deletions(-)

diff --git a/hashmap.c b/hashmap.c
index bb7c9979b8..922ed07954 100644
--- a/hashmap.c
+++ b/hashmap.c
@@ -174,22 +174,37 @@ void hashmap_init(struct hashmap *map, hashmap_cmp_fn equals_function,
 	map->do_count_items = 1;
 }
 
+static void free_individual_entries(struct hashmap *map, ssize_t entry_offset)
+{
+	struct hashmap_iter iter;
+	struct hashmap_entry *e;
+
+	hashmap_iter_init(map, &iter);
+	while ((e = hashmap_iter_next(&iter)))
+		/*
+		 * like container_of, but using caller-calculated
+		 * offset (caller being hashmap_free_entries)
+		 */
+		free((char *)e - entry_offset);
+}
+
+void hashmap_partial_clear_(struct hashmap *map, ssize_t entry_offset)
+{
+	if (!map || !map->table)
+		return;
+	if (entry_offset >= 0)  /* called by hashmap_clear_entries */
+		free_individual_entries(map, entry_offset);
+	memset(map->table, 0, map->tablesize * sizeof(struct hashmap_entry *));
+	map->shrink_at = 0;
+	map->private_size = 0;
+}
+
 void hashmap_free_(struct hashmap *map, ssize_t entry_offset)
 {
 	if (!map || !map->table)
 		return;
-	if (entry_offset >= 0) { /* called by hashmap_free_entries */
-		struct hashmap_iter iter;
-		struct hashmap_entry *e;
-
-		hashmap_iter_init(map, &iter);
-		while ((e = hashmap_iter_next(&iter)))
-			/*
-			 * like container_of, but using caller-calculated
-			 * offset (caller being hashmap_free_entries)
-			 */
-			free((char *)e - entry_offset);
-	}
+	if (entry_offset >= 0)  /* called by hashmap_free_entries */
+		free_individual_entries(map, entry_offset);
 	free(map->table);
 	memset(map, 0, sizeof(*map));
 }
diff --git a/hashmap.h b/hashmap.h
index 3b0f2bcade..e9430d582a 100644
--- a/hashmap.h
+++ b/hashmap.h
@@ -235,7 +235,8 @@ void hashmap_init(struct hashmap *map,
 		  const void *equals_function_data,
 		  size_t initial_size);
 
-/* internal function for freeing hashmap */
+/* internal functions for clearing or freeing hashmap */
+void hashmap_partial_clear_(struct hashmap *map, ssize_t offset);
 void hashmap_free_(struct hashmap *map, ssize_t offset);
 
 /*
@@ -268,6 +269,16 @@ void hashmap_free_(struct hashmap *map, ssize_t offset);
  */
 #define hashmap_free(map) hashmap_free_(map, -1)
 
+/*
+ * Basically the same as calling hashmap_free() followed by hashmap_init(),
+ * but doesn't incur the overhead of deallocating and reallocating
+ * map->table; it leaves map->table allocated and the same size but zeroes
+ * it out so it's ready for use again as an empty map.  As with
+ * hashmap_free(), you may need to free the entries yourself before calling
+ * this function.
+ */
+#define hashmap_partial_clear(map) hashmap_partial_clear_(map, -1)
+
 /*
  * Frees @map and all entries.  @type is the struct type of the entry
  * where @member is the hashmap_entry struct used to associate with @map.
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v4 05/13] hashmap: provide deallocation function names
  2020-11-05  0:22     ` [PATCH v4 " Elijah Newren via GitGitGadget
                         ` (3 preceding siblings ...)
  2020-11-05  0:22       ` [PATCH v4 04/13] hashmap: introduce a new hashmap_partial_clear() Elijah Newren via GitGitGadget
@ 2020-11-05  0:22       ` Elijah Newren via GitGitGadget
  2020-11-05  0:22       ` [PATCH v4 06/13] strmap: new utility functions Elijah Newren via GitGitGadget
                         ` (9 subsequent siblings)
  14 siblings, 0 replies; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-05  0:22 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

hashmap_free(), hashmap_free_entries(), and hashmap_free_() have existed
for a while, but aren't necessarily the clearest names, especially with
hashmap_partial_clear() being added to the mix and lazy-initialization
now being supported.  Peff suggested we adopt the following names[1]:

  - hashmap_clear() - remove all entries and de-allocate any
    hashmap-specific data, but be ready for reuse

  - hashmap_clear_and_free() - ditto, but free the entries themselves

  - hashmap_partial_clear() - remove all entries but don't deallocate
    table

  - hashmap_partial_clear_and_free() - ditto, but free the entries

This patch provides the new names and converts all existing callers over
to the new naming scheme.

[1] https://lore.kernel.org/git/20201030125059.GA3277724@coredump.intra.peff.net/

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 add-interactive.c       |  2 +-
 blame.c                 |  2 +-
 bloom.c                 |  2 +-
 builtin/fetch.c         |  6 +++---
 builtin/shortlog.c      |  2 +-
 config.c                |  2 +-
 diff.c                  |  4 ++--
 diffcore-rename.c       |  2 +-
 dir.c                   |  8 ++++----
 hashmap.c               |  6 +++---
 hashmap.h               | 44 +++++++++++++++++++++++++----------------
 merge-recursive.c       |  6 +++---
 name-hash.c             |  4 ++--
 object.c                |  2 +-
 oidmap.c                |  2 +-
 patch-ids.c             |  2 +-
 range-diff.c            |  2 +-
 ref-filter.c            |  2 +-
 revision.c              |  2 +-
 sequencer.c             |  4 ++--
 submodule-config.c      |  4 ++--
 t/helper/test-hashmap.c |  6 +++---
 22 files changed, 63 insertions(+), 53 deletions(-)

diff --git a/add-interactive.c b/add-interactive.c
index 555c4abf32..a14c0feaa2 100644
--- a/add-interactive.c
+++ b/add-interactive.c
@@ -557,7 +557,7 @@ static int get_modified_files(struct repository *r,
 		if (ps)
 			clear_pathspec(&rev.prune_data);
 	}
-	hashmap_free_entries(&s.file_map, struct pathname_entry, ent);
+	hashmap_clear_and_free(&s.file_map, struct pathname_entry, ent);
 	if (unmerged_count)
 		*unmerged_count = s.unmerged_count;
 	if (binary_count)
diff --git a/blame.c b/blame.c
index 686845b2b4..229beb6452 100644
--- a/blame.c
+++ b/blame.c
@@ -435,7 +435,7 @@ static void get_fingerprint(struct fingerprint *result,
 
 static void free_fingerprint(struct fingerprint *f)
 {
-	hashmap_free(&f->map);
+	hashmap_clear(&f->map);
 	free(f->entries);
 }
 
diff --git a/bloom.c b/bloom.c
index 68c73200a5..719c313a1c 100644
--- a/bloom.c
+++ b/bloom.c
@@ -287,7 +287,7 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r,
 		}
 
 	cleanup:
-		hashmap_free_entries(&pathmap, struct pathmap_hash_entry, entry);
+		hashmap_clear_and_free(&pathmap, struct pathmap_hash_entry, entry);
 	} else {
 		for (i = 0; i < diff_queued_diff.nr; i++)
 			diff_free_filepair(diff_queued_diff.queue[i]);
diff --git a/builtin/fetch.c b/builtin/fetch.c
index f9c3c49f14..ecf8537605 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -393,7 +393,7 @@ static void find_non_local_tags(const struct ref *refs,
 		item = refname_hash_add(&remote_refs, ref->name, &ref->old_oid);
 		string_list_insert(&remote_refs_list, ref->name);
 	}
-	hashmap_free_entries(&existing_refs, struct refname_hash_entry, ent);
+	hashmap_clear_and_free(&existing_refs, struct refname_hash_entry, ent);
 
 	/*
 	 * We may have a final lightweight tag that needs to be
@@ -428,7 +428,7 @@ static void find_non_local_tags(const struct ref *refs,
 		**tail = rm;
 		*tail = &rm->next;
 	}
-	hashmap_free_entries(&remote_refs, struct refname_hash_entry, ent);
+	hashmap_clear_and_free(&remote_refs, struct refname_hash_entry, ent);
 	string_list_clear(&remote_refs_list, 0);
 	oidset_clear(&fetch_oids);
 }
@@ -573,7 +573,7 @@ static struct ref *get_ref_map(struct remote *remote,
 		}
 	}
 	if (existing_refs_populated)
-		hashmap_free_entries(&existing_refs, struct refname_hash_entry, ent);
+		hashmap_clear_and_free(&existing_refs, struct refname_hash_entry, ent);
 
 	return ref_map;
 }
diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index 0a5c4968f6..83f0a739b4 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -220,7 +220,7 @@ static void strset_clear(struct strset *ss)
 {
 	if (!ss->map.table)
 		return;
-	hashmap_free_entries(&ss->map, struct strset_item, ent);
+	hashmap_clear_and_free(&ss->map, struct strset_item, ent);
 }
 
 static void insert_records_from_trailers(struct shortlog *log,
diff --git a/config.c b/config.c
index 2bdff4457b..8f324ed3a6 100644
--- a/config.c
+++ b/config.c
@@ -1963,7 +1963,7 @@ void git_configset_clear(struct config_set *cs)
 		free(entry->key);
 		string_list_clear(&entry->value_list, 1);
 	}
-	hashmap_free_entries(&cs->config_hash, struct config_set_element, ent);
+	hashmap_clear_and_free(&cs->config_hash, struct config_set_element, ent);
 	cs->hash_initialized = 0;
 	free(cs->list.items);
 	cs->list.nr = 0;
diff --git a/diff.c b/diff.c
index 2bb2f8f57e..8e0e59f5cf 100644
--- a/diff.c
+++ b/diff.c
@@ -6289,9 +6289,9 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
 			if (o->color_moved == COLOR_MOVED_ZEBRA_DIM)
 				dim_moved_lines(o);
 
-			hashmap_free_entries(&add_lines, struct moved_entry,
+			hashmap_clear_and_free(&add_lines, struct moved_entry,
 						ent);
-			hashmap_free_entries(&del_lines, struct moved_entry,
+			hashmap_clear_and_free(&del_lines, struct moved_entry,
 						ent);
 		}
 
diff --git a/diffcore-rename.c b/diffcore-rename.c
index 99e63e90f8..d367a6d244 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -407,7 +407,7 @@ static int find_exact_renames(struct diff_options *options)
 		renames += find_identical_files(&file_table, i, options);
 
 	/* Free the hash data structure and entries */
-	hashmap_free_entries(&file_table, struct file_similarity, entry);
+	hashmap_clear_and_free(&file_table, struct file_similarity, entry);
 
 	return renames;
 }
diff --git a/dir.c b/dir.c
index 78387110e6..161dce121e 100644
--- a/dir.c
+++ b/dir.c
@@ -817,8 +817,8 @@ static void add_pattern_to_hashsets(struct pattern_list *pl, struct path_pattern
 
 clear_hashmaps:
 	warning(_("disabling cone pattern matching"));
-	hashmap_free_entries(&pl->parent_hashmap, struct pattern_entry, ent);
-	hashmap_free_entries(&pl->recursive_hashmap, struct pattern_entry, ent);
+	hashmap_clear_and_free(&pl->parent_hashmap, struct pattern_entry, ent);
+	hashmap_clear_and_free(&pl->recursive_hashmap, struct pattern_entry, ent);
 	pl->use_cone_patterns = 0;
 }
 
@@ -921,8 +921,8 @@ void clear_pattern_list(struct pattern_list *pl)
 		free(pl->patterns[i]);
 	free(pl->patterns);
 	free(pl->filebuf);
-	hashmap_free_entries(&pl->recursive_hashmap, struct pattern_entry, ent);
-	hashmap_free_entries(&pl->parent_hashmap, struct pattern_entry, ent);
+	hashmap_clear_and_free(&pl->recursive_hashmap, struct pattern_entry, ent);
+	hashmap_clear_and_free(&pl->parent_hashmap, struct pattern_entry, ent);
 
 	memset(pl, 0, sizeof(*pl));
 }
diff --git a/hashmap.c b/hashmap.c
index 922ed07954..5009471800 100644
--- a/hashmap.c
+++ b/hashmap.c
@@ -183,7 +183,7 @@ static void free_individual_entries(struct hashmap *map, ssize_t entry_offset)
 	while ((e = hashmap_iter_next(&iter)))
 		/*
 		 * like container_of, but using caller-calculated
-		 * offset (caller being hashmap_free_entries)
+		 * offset (caller being hashmap_clear_and_free)
 		 */
 		free((char *)e - entry_offset);
 }
@@ -199,11 +199,11 @@ void hashmap_partial_clear_(struct hashmap *map, ssize_t entry_offset)
 	map->private_size = 0;
 }
 
-void hashmap_free_(struct hashmap *map, ssize_t entry_offset)
+void hashmap_clear_(struct hashmap *map, ssize_t entry_offset)
 {
 	if (!map || !map->table)
 		return;
-	if (entry_offset >= 0)  /* called by hashmap_free_entries */
+	if (entry_offset >= 0)  /* called by hashmap_clear_and_free */
 		free_individual_entries(map, entry_offset);
 	free(map->table);
 	memset(map, 0, sizeof(*map));
diff --git a/hashmap.h b/hashmap.h
index e9430d582a..7251687d73 100644
--- a/hashmap.h
+++ b/hashmap.h
@@ -96,7 +96,7 @@
  *         }
  *
  *         if (!strcmp("end", action)) {
- *             hashmap_free_entries(&map, struct long2string, ent);
+ *             hashmap_clear_and_free(&map, struct long2string, ent);
  *             break;
  *         }
  *     }
@@ -237,7 +237,7 @@ void hashmap_init(struct hashmap *map,
 
 /* internal functions for clearing or freeing hashmap */
 void hashmap_partial_clear_(struct hashmap *map, ssize_t offset);
-void hashmap_free_(struct hashmap *map, ssize_t offset);
+void hashmap_clear_(struct hashmap *map, ssize_t offset);
 
 /*
  * Frees a hashmap structure and allocated memory for the table, but does not
@@ -253,40 +253,50 @@ void hashmap_free_(struct hashmap *map, ssize_t offset);
  *      free(e->somefield);
  *      free(e);
  *    }
- *    hashmap_free(map);
+ *    hashmap_clear(map);
  *
  * instead of
  *
  *    hashmap_for_each_entry(map, hashmap_iter, e, hashmap_entry_name) {
  *      free(e->somefield);
  *    }
- *    hashmap_free_entries(map, struct my_entry_struct, hashmap_entry_name);
+ *    hashmap_clear_and_free(map, struct my_entry_struct, hashmap_entry_name);
  *
  * to avoid the implicit extra loop over the entries.  However, if there are
  * no special fields in your entry that need to be freed beyond the entry
  * itself, it is probably simpler to avoid the explicit loop and just call
- * hashmap_free_entries().
+ * hashmap_clear_and_free().
  */
-#define hashmap_free(map) hashmap_free_(map, -1)
+#define hashmap_clear(map) hashmap_clear_(map, -1)
 
 /*
- * Basically the same as calling hashmap_free() followed by hashmap_init(),
- * but doesn't incur the overhead of deallocating and reallocating
- * map->table; it leaves map->table allocated and the same size but zeroes
- * it out so it's ready for use again as an empty map.  As with
- * hashmap_free(), you may need to free the entries yourself before calling
- * this function.
+ * Similar to hashmap_clear(), except that the table is no deallocated; it
+ * is merely zeroed out but left the same size as before.  If the hashmap
+ * will be reused, this avoids the overhead of deallocating and
+ * reallocating map->table.  As with hashmap_clear(), you may need to free
+ * the entries yourself before calling this function.
  */
 #define hashmap_partial_clear(map) hashmap_partial_clear_(map, -1)
 
 /*
- * Frees @map and all entries.  @type is the struct type of the entry
- * where @member is the hashmap_entry struct used to associate with @map.
+ * Similar to hashmap_clear() but also frees all entries.  @type is the
+ * struct type of the entry where @member is the hashmap_entry struct used
+ * to associate with @map.
  *
- * See usage note above hashmap_free().
+ * See usage note above hashmap_clear().
  */
-#define hashmap_free_entries(map, type, member) \
-	hashmap_free_(map, offsetof(type, member));
+#define hashmap_clear_and_free(map, type, member) \
+	hashmap_clear_(map, offsetof(type, member))
+
+/*
+ * Similar to hashmap_partial_clear() but also frees all entries.  @type is
+ * the struct type of the entry where @member is the hashmap_entry struct
+ * used to associate with @map.
+ *
+ * See usage note above hashmap_clear().
+ */
+#define hashmap_partial_clear_and_free(map, type, member) \
+	hashmap_partial_clear_(map, offsetof(type, member))
 
 /* hashmap_entry functions */
 
diff --git a/merge-recursive.c b/merge-recursive.c
index d0214335a7..f736a0f632 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -2651,7 +2651,7 @@ static struct string_list *get_renames(struct merge_options *opt,
 		free(e->target_file);
 		string_list_clear(&e->source_files, 0);
 	}
-	hashmap_free_entries(&collisions, struct collision_entry, ent);
+	hashmap_clear_and_free(&collisions, struct collision_entry, ent);
 	return renames;
 }
 
@@ -2870,7 +2870,7 @@ static void initial_cleanup_rename(struct diff_queue_struct *pairs,
 		strbuf_release(&e->new_dir);
 		/* possible_new_dirs already cleared in get_directory_renames */
 	}
-	hashmap_free_entries(dir_renames, struct dir_rename_entry, ent);
+	hashmap_clear_and_free(dir_renames, struct dir_rename_entry, ent);
 	free(dir_renames);
 
 	free(pairs->queue);
@@ -3497,7 +3497,7 @@ static int merge_trees_internal(struct merge_options *opt,
 		string_list_clear(entries, 1);
 		free(entries);
 
-		hashmap_free_entries(&opt->priv->current_file_dir_set,
+		hashmap_clear_and_free(&opt->priv->current_file_dir_set,
 					struct path_hashmap_entry, e);
 
 		if (clean < 0) {
diff --git a/name-hash.c b/name-hash.c
index fb526a3775..5d3c7b12c1 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -726,6 +726,6 @@ void free_name_hash(struct index_state *istate)
 		return;
 	istate->name_hash_initialized = 0;
 
-	hashmap_free(&istate->name_hash);
-	hashmap_free_entries(&istate->dir_hash, struct dir_entry, ent);
+	hashmap_clear(&istate->name_hash);
+	hashmap_clear_and_free(&istate->dir_hash, struct dir_entry, ent);
 }
diff --git a/object.c b/object.c
index 3257518656..b8406409d5 100644
--- a/object.c
+++ b/object.c
@@ -532,7 +532,7 @@ void raw_object_store_clear(struct raw_object_store *o)
 	close_object_store(o);
 	o->packed_git = NULL;
 
-	hashmap_free(&o->pack_map);
+	hashmap_clear(&o->pack_map);
 }
 
 void parsed_object_pool_clear(struct parsed_object_pool *o)
diff --git a/oidmap.c b/oidmap.c
index 423aa014a3..286a04a53c 100644
--- a/oidmap.c
+++ b/oidmap.c
@@ -27,7 +27,7 @@ void oidmap_free(struct oidmap *map, int free_entries)
 		return;
 
 	/* TODO: make oidmap itself not depend on struct layouts */
-	hashmap_free_(&map->map, free_entries ? 0 : -1);
+	hashmap_clear_(&map->map, free_entries ? 0 : -1);
 }
 
 void *oidmap_get(const struct oidmap *map, const struct object_id *key)
diff --git a/patch-ids.c b/patch-ids.c
index 12aa6d494b..21973e4933 100644
--- a/patch-ids.c
+++ b/patch-ids.c
@@ -71,7 +71,7 @@ int init_patch_ids(struct repository *r, struct patch_ids *ids)
 
 int free_patch_ids(struct patch_ids *ids)
 {
-	hashmap_free_entries(&ids->patches, struct patch_id, ent);
+	hashmap_clear_and_free(&ids->patches, struct patch_id, ent);
 	return 0;
 }
 
diff --git a/range-diff.c b/range-diff.c
index 24dc435e48..befeecae44 100644
--- a/range-diff.c
+++ b/range-diff.c
@@ -266,7 +266,7 @@ static void find_exact_matches(struct string_list *a, struct string_list *b)
 		}
 	}
 
-	hashmap_free(&map);
+	hashmap_clear(&map);
 }
 
 static void diffsize_consume(void *data, char *line, unsigned long len)
diff --git a/ref-filter.c b/ref-filter.c
index c62f6b4822..5e66b8cd76 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2222,7 +2222,7 @@ void ref_array_clear(struct ref_array *array)
 	used_atom_cnt = 0;
 
 	if (ref_to_worktree_map.worktrees) {
-		hashmap_free_entries(&(ref_to_worktree_map.map),
+		hashmap_clear_and_free(&(ref_to_worktree_map.map),
 					struct ref_to_worktree_entry, ent);
 		free_worktrees(ref_to_worktree_map.worktrees);
 		ref_to_worktree_map.worktrees = NULL;
diff --git a/revision.c b/revision.c
index aa62212040..f27649d45d 100644
--- a/revision.c
+++ b/revision.c
@@ -139,7 +139,7 @@ static void paths_and_oids_clear(struct hashmap *map)
 		free(entry->path);
 	}
 
-	hashmap_free_entries(map, struct path_and_oids_entry, ent);
+	hashmap_clear_and_free(map, struct path_and_oids_entry, ent);
 }
 
 static void paths_and_oids_insert(struct hashmap *map,
diff --git a/sequencer.c b/sequencer.c
index 00acb12496..23a09c3e7a 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -5058,7 +5058,7 @@ static int make_script_with_merges(struct pretty_print_context *pp,
 
 	oidmap_free(&commit2todo, 1);
 	oidmap_free(&state.commit2label, 1);
-	hashmap_free_entries(&state.labels, struct labels_entry, entry);
+	hashmap_clear_and_free(&state.labels, struct labels_entry, entry);
 	strbuf_release(&state.buf);
 
 	return 0;
@@ -5577,7 +5577,7 @@ int todo_list_rearrange_squash(struct todo_list *todo_list)
 	for (i = 0; i < todo_list->nr; i++)
 		free(subjects[i]);
 	free(subjects);
-	hashmap_free_entries(&subject2item, struct subject2item_entry, entry);
+	hashmap_clear_and_free(&subject2item, struct subject2item_entry, entry);
 
 	clear_commit_todo_item(&commit_todo);
 
diff --git a/submodule-config.c b/submodule-config.c
index c569e22aa3..f502505566 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -103,8 +103,8 @@ static void submodule_cache_clear(struct submodule_cache *cache)
 				ent /* member name */)
 		free_one_config(entry);
 
-	hashmap_free_entries(&cache->for_path, struct submodule_entry, ent);
-	hashmap_free_entries(&cache->for_name, struct submodule_entry, ent);
+	hashmap_clear_and_free(&cache->for_path, struct submodule_entry, ent);
+	hashmap_clear_and_free(&cache->for_name, struct submodule_entry, ent);
 	cache->initialized = 0;
 	cache->gitmodules_read = 0;
 }
diff --git a/t/helper/test-hashmap.c b/t/helper/test-hashmap.c
index f38706216f..2475663b49 100644
--- a/t/helper/test-hashmap.c
+++ b/t/helper/test-hashmap.c
@@ -110,7 +110,7 @@ static void perf_hashmap(unsigned int method, unsigned int rounds)
 				hashmap_add(&map, &entries[i]->ent);
 			}
 
-			hashmap_free(&map);
+			hashmap_clear(&map);
 		}
 	} else {
 		/* test map lookups */
@@ -130,7 +130,7 @@ static void perf_hashmap(unsigned int method, unsigned int rounds)
 			}
 		}
 
-		hashmap_free(&map);
+		hashmap_clear(&map);
 	}
 }
 
@@ -262,6 +262,6 @@ int cmd__hashmap(int argc, const char **argv)
 	}
 
 	strbuf_release(&line);
-	hashmap_free_entries(&map, struct test_entry, ent);
+	hashmap_clear_and_free(&map, struct test_entry, ent);
 	return 0;
 }
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v4 06/13] strmap: new utility functions
  2020-11-05  0:22     ` [PATCH v4 " Elijah Newren via GitGitGadget
                         ` (4 preceding siblings ...)
  2020-11-05  0:22       ` [PATCH v4 05/13] hashmap: provide deallocation function names Elijah Newren via GitGitGadget
@ 2020-11-05  0:22       ` Elijah Newren via GitGitGadget
  2020-11-05  0:22       ` [PATCH v4 07/13] strmap: add more " Elijah Newren via GitGitGadget
                         ` (8 subsequent siblings)
  14 siblings, 0 replies; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-05  0:22 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Add strmap as a new struct and associated utility functions,
specifically for hashmaps that map strings to some value.  The API is
taken directly from Peff's proposal at
https://lore.kernel.org/git/20180906191203.GA26184@sigill.intra.peff.net/

Note that similar string-list, I have a strdup_strings setting.
However, unlike string-list, strmap_init() does not take a parameter for
this setting and instead automatically sets it to 1; callers who want to
control this detail need to instead call strmap_init_with_options().
(Future patches will add additional parameters to
strmap_init_with_options()).

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 Makefile |  1 +
 strmap.c | 99 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 strmap.h | 65 +++++++++++++++++++++++++++++++++++++
 3 files changed, 165 insertions(+)
 create mode 100644 strmap.c
 create mode 100644 strmap.h

diff --git a/Makefile b/Makefile
index 95571ee3fc..777a34c01c 100644
--- a/Makefile
+++ b/Makefile
@@ -1000,6 +1000,7 @@ LIB_OBJS += stable-qsort.o
 LIB_OBJS += strbuf.o
 LIB_OBJS += streaming.o
 LIB_OBJS += string-list.o
+LIB_OBJS += strmap.o
 LIB_OBJS += strvec.o
 LIB_OBJS += sub-process.o
 LIB_OBJS += submodule-config.o
diff --git a/strmap.c b/strmap.c
new file mode 100644
index 0000000000..53f284eb20
--- /dev/null
+++ b/strmap.c
@@ -0,0 +1,99 @@
+#include "git-compat-util.h"
+#include "strmap.h"
+
+int cmp_strmap_entry(const void *hashmap_cmp_fn_data,
+		     const struct hashmap_entry *entry1,
+		     const struct hashmap_entry *entry2,
+		     const void *keydata)
+{
+	const struct strmap_entry *e1, *e2;
+
+	e1 = container_of(entry1, const struct strmap_entry, ent);
+	e2 = container_of(entry2, const struct strmap_entry, ent);
+	return strcmp(e1->key, e2->key);
+}
+
+static struct strmap_entry *find_strmap_entry(struct strmap *map,
+					      const char *str)
+{
+	struct strmap_entry entry;
+	hashmap_entry_init(&entry.ent, strhash(str));
+	entry.key = str;
+	return hashmap_get_entry(&map->map, &entry, ent, NULL);
+}
+
+void strmap_init(struct strmap *map)
+{
+	strmap_init_with_options(map, 1);
+}
+
+void strmap_init_with_options(struct strmap *map,
+			      int strdup_strings)
+{
+	hashmap_init(&map->map, cmp_strmap_entry, NULL, 0);
+	map->strdup_strings = strdup_strings;
+}
+
+static void strmap_free_entries_(struct strmap *map, int free_values)
+{
+	struct hashmap_iter iter;
+	struct strmap_entry *e;
+
+	if (!map)
+		return;
+
+	/*
+	 * We need to iterate over the hashmap entries and free
+	 * e->key and e->value ourselves; hashmap has no API to
+	 * take care of that for us.  Since we're already iterating over
+	 * the hashmap, though, might as well free e too and avoid the need
+	 * to make some call into the hashmap API to do that.
+	 */
+	hashmap_for_each_entry(&map->map, &iter, e, ent) {
+		if (free_values)
+			free(e->value);
+		if (map->strdup_strings)
+			free((char*)e->key);
+		free(e);
+	}
+}
+
+void strmap_clear(struct strmap *map, int free_values)
+{
+	strmap_free_entries_(map, free_values);
+	hashmap_clear(&map->map);
+}
+
+void *strmap_put(struct strmap *map, const char *str, void *data)
+{
+	struct strmap_entry *entry = find_strmap_entry(map, str);
+	void *old = NULL;
+
+	if (entry) {
+		old = entry->value;
+		entry->value = data;
+	} else {
+		const char *key = str;
+
+		entry = xmalloc(sizeof(*entry));
+		hashmap_entry_init(&entry->ent, strhash(str));
+
+		if (map->strdup_strings)
+			key = xstrdup(str);
+		entry->key = key;
+		entry->value = data;
+		hashmap_add(&map->map, &entry->ent);
+	}
+	return old;
+}
+
+void *strmap_get(struct strmap *map, const char *str)
+{
+	struct strmap_entry *entry = find_strmap_entry(map, str);
+	return entry ? entry->value : NULL;
+}
+
+int strmap_contains(struct strmap *map, const char *str)
+{
+	return find_strmap_entry(map, str) != NULL;
+}
diff --git a/strmap.h b/strmap.h
new file mode 100644
index 0000000000..96888c23ad
--- /dev/null
+++ b/strmap.h
@@ -0,0 +1,65 @@
+#ifndef STRMAP_H
+#define STRMAP_H
+
+#include "hashmap.h"
+
+struct strmap {
+	struct hashmap map;
+	unsigned int strdup_strings:1;
+};
+
+struct strmap_entry {
+	struct hashmap_entry ent;
+	const char *key;
+	void *value;
+};
+
+int cmp_strmap_entry(const void *hashmap_cmp_fn_data,
+		     const struct hashmap_entry *entry1,
+		     const struct hashmap_entry *entry2,
+		     const void *keydata);
+
+#define STRMAP_INIT { \
+			.map = HASHMAP_INIT(cmp_strmap_entry, NULL),  \
+			.strdup_strings = 1,                          \
+		    }
+
+/*
+ * Initialize the members of the strmap.  Any keys added to the strmap will
+ * be strdup'ed with their memory managed by the strmap.
+ */
+void strmap_init(struct strmap *map);
+
+/*
+ * Same as strmap_init, but for those who want to control the memory management
+ * carefully instead of using the default of strdup_strings=1.
+ */
+void strmap_init_with_options(struct strmap *map,
+			      int strdup_strings);
+
+/*
+ * Remove all entries from the map, releasing any allocated resources.
+ */
+void strmap_clear(struct strmap *map, int free_values);
+
+/*
+ * Insert "str" into the map, pointing to "data".
+ *
+ * If an entry for "str" already exists, its data pointer is overwritten, and
+ * the original data pointer returned. Otherwise, returns NULL.
+ */
+void *strmap_put(struct strmap *map, const char *str, void *data);
+
+/*
+ * Return the data pointer mapped by "str", or NULL if the entry does not
+ * exist.
+ */
+void *strmap_get(struct strmap *map, const char *str);
+
+/*
+ * Return non-zero iff "str" is present in the map. This differs from
+ * strmap_get() in that it can distinguish entries with a NULL data pointer.
+ */
+int strmap_contains(struct strmap *map, const char *str);
+
+#endif /* STRMAP_H */
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v4 07/13] strmap: add more utility functions
  2020-11-05  0:22     ` [PATCH v4 " Elijah Newren via GitGitGadget
                         ` (5 preceding siblings ...)
  2020-11-05  0:22       ` [PATCH v4 06/13] strmap: new utility functions Elijah Newren via GitGitGadget
@ 2020-11-05  0:22       ` Elijah Newren via GitGitGadget
  2020-11-05  0:22       ` [PATCH v4 08/13] strmap: enable faster clearing and reusing of strmaps Elijah Newren via GitGitGadget
                         ` (7 subsequent siblings)
  14 siblings, 0 replies; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-05  0:22 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

This adds a number of additional convienence functions I want/need:
  * strmap_get_size()
  * strmap_empty()
  * strmap_remove()
  * strmap_for_each_entry()
  * strmap_get_entry()

I suspect the first four are self-explanatory.

strmap_get_entry() is similar to strmap_get() except that instead of just
returning the void* value that the string maps to, it returns the
strmap_entry that contains both the string and the void* value (or
NULL if the string isn't in the map).  This is helpful because it avoids
multiple lookups, e.g. in some cases a caller would need to call:
  * strmap_contains() to check that the map has an entry for the string
  * strmap_get() to get the void* value
  * <do some work to update the value>
  * strmap_put() to update/overwrite the value
If the void* pointer returned really is a pointer, then the last step is
unnecessary, but if the void* pointer is just cast to an integer then
strmap_put() will be needed.  In contrast, one can call strmap_get_entry()
and then:
  * check if the string was in the map by whether the pointer is NULL
  * access the value via entry->value
  * directly update entry->value
meaning that we can replace two or three hash table lookups with one.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 strmap.c | 20 ++++++++++++++++++++
 strmap.h | 34 ++++++++++++++++++++++++++++++++++
 2 files changed, 54 insertions(+)

diff --git a/strmap.c b/strmap.c
index 53f284eb20..829f1bc095 100644
--- a/strmap.c
+++ b/strmap.c
@@ -87,6 +87,11 @@ void *strmap_put(struct strmap *map, const char *str, void *data)
 	return old;
 }
 
+struct strmap_entry *strmap_get_entry(struct strmap *map, const char *str)
+{
+	return find_strmap_entry(map, str);
+}
+
 void *strmap_get(struct strmap *map, const char *str)
 {
 	struct strmap_entry *entry = find_strmap_entry(map, str);
@@ -97,3 +102,18 @@ int strmap_contains(struct strmap *map, const char *str)
 {
 	return find_strmap_entry(map, str) != NULL;
 }
+
+void strmap_remove(struct strmap *map, const char *str, int free_value)
+{
+	struct strmap_entry entry, *ret;
+	hashmap_entry_init(&entry.ent, strhash(str));
+	entry.key = str;
+	ret = hashmap_remove_entry(&map->map, &entry, ent, NULL);
+	if (!ret)
+		return;
+	if (free_value)
+		free(ret->value);
+	if (map->strdup_strings)
+		free((char*)ret->key);
+	free(ret);
+}
diff --git a/strmap.h b/strmap.h
index 96888c23ad..f74bc582e4 100644
--- a/strmap.h
+++ b/strmap.h
@@ -50,6 +50,12 @@ void strmap_clear(struct strmap *map, int free_values);
  */
 void *strmap_put(struct strmap *map, const char *str, void *data);
 
+/*
+ * Return the strmap_entry mapped by "str", or NULL if there is not such
+ * an item in map.
+ */
+struct strmap_entry *strmap_get_entry(struct strmap *map, const char *str);
+
 /*
  * Return the data pointer mapped by "str", or NULL if the entry does not
  * exist.
@@ -62,4 +68,32 @@ void *strmap_get(struct strmap *map, const char *str);
  */
 int strmap_contains(struct strmap *map, const char *str);
 
+/*
+ * Remove the given entry from the strmap.  If the string isn't in the
+ * strmap, the map is not altered.
+ */
+void strmap_remove(struct strmap *map, const char *str, int free_value);
+
+/*
+ * Return how many entries the strmap has.
+ */
+static inline unsigned int strmap_get_size(struct strmap *map)
+{
+	return hashmap_get_size(&map->map);
+}
+
+/*
+ * Return whether the strmap is empty.
+ */
+static inline int strmap_empty(struct strmap *map)
+{
+	return strmap_get_size(map) == 0;
+}
+
+/*
+ * iterate through @map using @iter, @var is a pointer to a type strmap_entry
+ */
+#define strmap_for_each_entry(mystrmap, iter, var)	\
+	hashmap_for_each_entry(&(mystrmap)->map, iter, var, ent)
+
 #endif /* STRMAP_H */
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v4 08/13] strmap: enable faster clearing and reusing of strmaps
  2020-11-05  0:22     ` [PATCH v4 " Elijah Newren via GitGitGadget
                         ` (6 preceding siblings ...)
  2020-11-05  0:22       ` [PATCH v4 07/13] strmap: add more " Elijah Newren via GitGitGadget
@ 2020-11-05  0:22       ` Elijah Newren via GitGitGadget
  2020-11-05  0:22       ` [PATCH v4 09/13] strmap: add functions facilitating use as a string->int map Elijah Newren via GitGitGadget
                         ` (6 subsequent siblings)
  14 siblings, 0 replies; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-05  0:22 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

When strmaps are used heavily, such as is done by my new merge-ort
algorithm, and strmaps need to be cleared but then re-used (because of
e.g. picking multiple commits to cherry-pick, or due to a recursive
merge having several different merges while recursing), free-ing and
reallocating map->table repeatedly can add up in time, especially since
it will likely be reallocated to a much smaller size but the previous
merge provides a good guide to the right size to use for the next merge.

Introduce strmap_partial_clear() to take advantage of this type of
situation; it will act similar to strmap_clear() except that
map->table's entries are zeroed instead of map->table being free'd.
Making use of this function reduced the cost of
clear_or_reinit_internal_opts() by about 20% in mert-ort, and dropped
the overall runtime of my rebase testcase by just under 2%.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 strmap.c | 6 ++++++
 strmap.h | 6 ++++++
 2 files changed, 12 insertions(+)

diff --git a/strmap.c b/strmap.c
index 829f1bc095..c410c5241a 100644
--- a/strmap.c
+++ b/strmap.c
@@ -64,6 +64,12 @@ void strmap_clear(struct strmap *map, int free_values)
 	hashmap_clear(&map->map);
 }
 
+void strmap_partial_clear(struct strmap *map, int free_values)
+{
+	strmap_free_entries_(map, free_values);
+	hashmap_partial_clear(&map->map);
+}
+
 void *strmap_put(struct strmap *map, const char *str, void *data)
 {
 	struct strmap_entry *entry = find_strmap_entry(map, str);
diff --git a/strmap.h b/strmap.h
index f74bc582e4..c14fcee148 100644
--- a/strmap.h
+++ b/strmap.h
@@ -42,6 +42,12 @@ void strmap_init_with_options(struct strmap *map,
  */
 void strmap_clear(struct strmap *map, int free_values);
 
+/*
+ * Similar to strmap_clear() but leaves map->map->table allocated and
+ * pre-sized so that subsequent uses won't need as many rehashings.
+ */
+void strmap_partial_clear(struct strmap *map, int free_values);
+
 /*
  * Insert "str" into the map, pointing to "data".
  *
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v4 09/13] strmap: add functions facilitating use as a string->int map
  2020-11-05  0:22     ` [PATCH v4 " Elijah Newren via GitGitGadget
                         ` (7 preceding siblings ...)
  2020-11-05  0:22       ` [PATCH v4 08/13] strmap: enable faster clearing and reusing of strmaps Elijah Newren via GitGitGadget
@ 2020-11-05  0:22       ` Elijah Newren via GitGitGadget
  2020-11-05  0:22       ` [PATCH v4 10/13] strmap: add a strset sub-type Elijah Newren via GitGitGadget
                         ` (5 subsequent siblings)
  14 siblings, 0 replies; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-05  0:22 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Although strmap could be used as a string->int map, one either had to
allocate an int for every entry and then deallocate later, or one had to
do a bunch of casting between (void*) and (intptr_t).

Add some special functions that do the casting.  Also, rename put->set
for such wrapper functions since 'put' implied there may be some
deallocation needed if the string was already found in the map, which
isn't the case when we're storing an int value directly in the void*
slot instead of using the void* slot as a pointer to data.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 strmap.c | 11 +++++++
 strmap.h | 94 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 105 insertions(+)

diff --git a/strmap.c b/strmap.c
index c410c5241a..0d10a884b5 100644
--- a/strmap.c
+++ b/strmap.c
@@ -123,3 +123,14 @@ void strmap_remove(struct strmap *map, const char *str, int free_value)
 		free((char*)ret->key);
 	free(ret);
 }
+
+void strintmap_incr(struct strintmap *map, const char *str, intptr_t amt)
+{
+	struct strmap_entry *entry = find_strmap_entry(&map->map, str);
+	if (entry) {
+		intptr_t *whence = (intptr_t*)&entry->value;
+		*whence += amt;
+	}
+	else
+		strintmap_set(map, str, map->default_value + amt);
+}
diff --git a/strmap.h b/strmap.h
index c14fcee148..56a5cdb864 100644
--- a/strmap.h
+++ b/strmap.h
@@ -23,6 +23,10 @@ int cmp_strmap_entry(const void *hashmap_cmp_fn_data,
 			.map = HASHMAP_INIT(cmp_strmap_entry, NULL),  \
 			.strdup_strings = 1,                          \
 		    }
+#define STRINTMAP_INIT { \
+			.map = STRMAP_INIT,   \
+			.default_value = 0,   \
+		       }
 
 /*
  * Initialize the members of the strmap.  Any keys added to the strmap will
@@ -102,4 +106,94 @@ static inline int strmap_empty(struct strmap *map)
 #define strmap_for_each_entry(mystrmap, iter, var)	\
 	hashmap_for_each_entry(&(mystrmap)->map, iter, var, ent)
 
+
+/*
+ * strintmap:
+ *    A map of string -> int, typecasting the void* of strmap to an int.
+ *
+ * Primary differences:
+ *    1) Since the void* value is just an int in disguise, there is no value
+ *       to free.  (Thus one fewer argument to strintmap_clear)
+ *    2) strintmap_get() returns an int, or returns the default_value if the
+ *       key is not found in the strintmap.
+ *    3) No strmap_put() equivalent; strintmap_set() and strintmap_incr()
+ *       instead.
+ */
+
+struct strintmap {
+	struct strmap map;
+	int default_value;
+};
+
+#define strintmap_for_each_entry(mystrmap, iter, var)	\
+	strmap_for_each_entry(&(mystrmap)->map, iter, var)
+
+static inline void strintmap_init(struct strintmap *map, int default_value)
+{
+	strmap_init(&map->map);
+	map->default_value = default_value;
+}
+
+static inline void strintmap_init_with_options(struct strintmap *map,
+					       int default_value,
+					       int strdup_strings)
+{
+	strmap_init_with_options(&map->map, strdup_strings);
+	map->default_value = default_value;
+}
+
+static inline void strintmap_clear(struct strintmap *map)
+{
+	strmap_clear(&map->map, 0);
+}
+
+static inline void strintmap_partial_clear(struct strintmap *map)
+{
+	strmap_partial_clear(&map->map, 0);
+}
+
+static inline int strintmap_contains(struct strintmap *map, const char *str)
+{
+	return strmap_contains(&map->map, str);
+}
+
+static inline void strintmap_remove(struct strintmap *map, const char *str)
+{
+	return strmap_remove(&map->map, str, 0);
+}
+
+static inline int strintmap_empty(struct strintmap *map)
+{
+	return strmap_empty(&map->map);
+}
+
+static inline unsigned int strintmap_get_size(struct strintmap *map)
+{
+	return strmap_get_size(&map->map);
+}
+
+/*
+ * Returns the value for str in the map.  If str isn't found in the map,
+ * the map's default_value is returned.
+ */
+static inline int strintmap_get(struct strintmap *map, const char *str)
+{
+	struct strmap_entry *result = strmap_get_entry(&map->map, str);
+	if (!result)
+		return map->default_value;
+	return (intptr_t)result->value;
+}
+
+static inline void strintmap_set(struct strintmap *map, const char *str,
+				 intptr_t v)
+{
+	strmap_put(&map->map, str, (void *)v);
+}
+
+/*
+ * Increment the value for str by amt.  If str isn't in the map, add it and
+ * set its value to default_value + amt.
+ */
+void strintmap_incr(struct strintmap *map, const char *str, intptr_t amt);
+
 #endif /* STRMAP_H */
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v4 10/13] strmap: add a strset sub-type
  2020-11-05  0:22     ` [PATCH v4 " Elijah Newren via GitGitGadget
                         ` (8 preceding siblings ...)
  2020-11-05  0:22       ` [PATCH v4 09/13] strmap: add functions facilitating use as a string->int map Elijah Newren via GitGitGadget
@ 2020-11-05  0:22       ` Elijah Newren via GitGitGadget
  2020-11-05  0:22       ` [PATCH v4 11/13] strmap: enable allocations to come from a mem_pool Elijah Newren via GitGitGadget
                         ` (4 subsequent siblings)
  14 siblings, 0 replies; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-05  0:22 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Similar to adding strintmap for special-casing a string -> int mapping,
add a strset type for cases where we really are only interested in using
strmap for storing a set rather than a mapping.  In this case, we'll
always just store NULL for the value but the different struct type makes
it clearer than code comments how a variable is intended to be used.

The difference in usage also results in some differences in API: a few
things that aren't necessary or meaningful are dropped (namely, the
free_values argument to *_clear(), and the *_get() function), and
strset_add() is chosen as the API instead of strset_put().

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 strmap.h | 65 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 65 insertions(+)

diff --git a/strmap.h b/strmap.h
index 56a5cdb864..22df987644 100644
--- a/strmap.h
+++ b/strmap.h
@@ -27,6 +27,7 @@ int cmp_strmap_entry(const void *hashmap_cmp_fn_data,
 			.map = STRMAP_INIT,   \
 			.default_value = 0,   \
 		       }
+#define STRSET_INIT { .map = STRMAP_INIT }
 
 /*
  * Initialize the members of the strmap.  Any keys added to the strmap will
@@ -196,4 +197,68 @@ static inline void strintmap_set(struct strintmap *map, const char *str,
  */
 void strintmap_incr(struct strintmap *map, const char *str, intptr_t amt);
 
+/*
+ * strset:
+ *    A set of strings.
+ *
+ * Primary differences with strmap:
+ *    1) The value is always NULL, and ignored.  As there is no value to free,
+ *       there is one fewer argument to strset_clear
+ *    2) No strset_get() because there is no value.
+ *    3) No strset_put(); use strset_add() instead.
+ */
+
+struct strset {
+	struct strmap map;
+};
+
+#define strset_for_each_entry(mystrset, iter, var)	\
+	strmap_for_each_entry(&(mystrset)->map, iter, var)
+
+static inline void strset_init(struct strset *set)
+{
+	strmap_init(&set->map);
+}
+
+static inline void strset_init_with_options(struct strset *set,
+					    int strdup_strings)
+{
+	strmap_init_with_options(&set->map, strdup_strings);
+}
+
+static inline void strset_clear(struct strset *set)
+{
+	strmap_clear(&set->map, 0);
+}
+
+static inline void strset_partial_clear(struct strset *set)
+{
+	strmap_partial_clear(&set->map, 0);
+}
+
+static inline int strset_contains(struct strset *set, const char *str)
+{
+	return strmap_contains(&set->map, str);
+}
+
+static inline void strset_remove(struct strset *set, const char *str)
+{
+	return strmap_remove(&set->map, str, 0);
+}
+
+static inline int strset_empty(struct strset *set)
+{
+	return strmap_empty(&set->map);
+}
+
+static inline unsigned int strset_get_size(struct strset *set)
+{
+	return strmap_get_size(&set->map);
+}
+
+static inline void strset_add(struct strset *set, const char *str)
+{
+	strmap_put(&set->map, str, NULL);
+}
+
 #endif /* STRMAP_H */
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v4 11/13] strmap: enable allocations to come from a mem_pool
  2020-11-05  0:22     ` [PATCH v4 " Elijah Newren via GitGitGadget
                         ` (9 preceding siblings ...)
  2020-11-05  0:22       ` [PATCH v4 10/13] strmap: add a strset sub-type Elijah Newren via GitGitGadget
@ 2020-11-05  0:22       ` Elijah Newren via GitGitGadget
  2020-11-05  0:22       ` [PATCH v4 12/13] strmap: take advantage of FLEXPTR_ALLOC_STR when relevant Elijah Newren via GitGitGadget
                         ` (3 subsequent siblings)
  14 siblings, 0 replies; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-05  0:22 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

For heavy users of strmaps, allowing the keys and entries to be
allocated from a memory pool can provide significant overhead savings.
Add an option to strmap_init_with_options() to specify a memory pool.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 strmap.c | 31 ++++++++++++++++++++++---------
 strmap.h | 11 ++++++++---
 2 files changed, 30 insertions(+), 12 deletions(-)

diff --git a/strmap.c b/strmap.c
index 0d10a884b5..f5904138e1 100644
--- a/strmap.c
+++ b/strmap.c
@@ -1,5 +1,6 @@
 #include "git-compat-util.h"
 #include "strmap.h"
+#include "mem-pool.h"
 
 int cmp_strmap_entry(const void *hashmap_cmp_fn_data,
 		     const struct hashmap_entry *entry1,
@@ -24,13 +25,15 @@ static struct strmap_entry *find_strmap_entry(struct strmap *map,
 
 void strmap_init(struct strmap *map)
 {
-	strmap_init_with_options(map, 1);
+	strmap_init_with_options(map, NULL, 1);
 }
 
 void strmap_init_with_options(struct strmap *map,
+			      struct mem_pool *pool,
 			      int strdup_strings)
 {
 	hashmap_init(&map->map, cmp_strmap_entry, NULL, 0);
+	map->pool = pool;
 	map->strdup_strings = strdup_strings;
 }
 
@@ -42,6 +45,10 @@ static void strmap_free_entries_(struct strmap *map, int free_values)
 	if (!map)
 		return;
 
+	if (!free_values && map->pool)
+		/* Memory other than util is owned by and freed with the pool */
+		return;
+
 	/*
 	 * We need to iterate over the hashmap entries and free
 	 * e->key and e->value ourselves; hashmap has no API to
@@ -52,9 +59,11 @@ static void strmap_free_entries_(struct strmap *map, int free_values)
 	hashmap_for_each_entry(&map->map, &iter, e, ent) {
 		if (free_values)
 			free(e->value);
-		if (map->strdup_strings)
-			free((char*)e->key);
-		free(e);
+		if (!map->pool) {
+			if (map->strdup_strings)
+				free((char*)e->key);
+			free(e);
+		}
 	}
 }
 
@@ -81,11 +90,13 @@ void *strmap_put(struct strmap *map, const char *str, void *data)
 	} else {
 		const char *key = str;
 
-		entry = xmalloc(sizeof(*entry));
+		entry = map->pool ? mem_pool_alloc(map->pool, sizeof(*entry))
+				  : xmalloc(sizeof(*entry));
 		hashmap_entry_init(&entry->ent, strhash(str));
 
 		if (map->strdup_strings)
-			key = xstrdup(str);
+			key = map->pool ? mem_pool_strdup(map->pool, str)
+					: xstrdup(str);
 		entry->key = key;
 		entry->value = data;
 		hashmap_add(&map->map, &entry->ent);
@@ -119,9 +130,11 @@ void strmap_remove(struct strmap *map, const char *str, int free_value)
 		return;
 	if (free_value)
 		free(ret->value);
-	if (map->strdup_strings)
-		free((char*)ret->key);
-	free(ret);
+	if (!map->pool) {
+		if (map->strdup_strings)
+			free((char*)ret->key);
+		free(ret);
+	}
 }
 
 void strintmap_incr(struct strintmap *map, const char *str, intptr_t amt)
diff --git a/strmap.h b/strmap.h
index 22df987644..b9d882a2d0 100644
--- a/strmap.h
+++ b/strmap.h
@@ -3,8 +3,10 @@
 
 #include "hashmap.h"
 
+struct mempool;
 struct strmap {
 	struct hashmap map;
+	struct mem_pool *pool;
 	unsigned int strdup_strings:1;
 };
 
@@ -37,9 +39,10 @@ void strmap_init(struct strmap *map);
 
 /*
  * Same as strmap_init, but for those who want to control the memory management
- * carefully instead of using the default of strdup_strings=1.
+ * carefully instead of using the default of strdup_strings=1 and pool=NULL.
  */
 void strmap_init_with_options(struct strmap *map,
+			      struct mem_pool *pool,
 			      int strdup_strings);
 
 /*
@@ -137,9 +140,10 @@ static inline void strintmap_init(struct strintmap *map, int default_value)
 
 static inline void strintmap_init_with_options(struct strintmap *map,
 					       int default_value,
+					       struct mem_pool *pool,
 					       int strdup_strings)
 {
-	strmap_init_with_options(&map->map, strdup_strings);
+	strmap_init_with_options(&map->map, pool, strdup_strings);
 	map->default_value = default_value;
 }
 
@@ -221,9 +225,10 @@ static inline void strset_init(struct strset *set)
 }
 
 static inline void strset_init_with_options(struct strset *set,
+					    struct mem_pool *pool,
 					    int strdup_strings)
 {
-	strmap_init_with_options(&set->map, strdup_strings);
+	strmap_init_with_options(&set->map, pool, strdup_strings);
 }
 
 static inline void strset_clear(struct strset *set)
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v4 12/13] strmap: take advantage of FLEXPTR_ALLOC_STR when relevant
  2020-11-05  0:22     ` [PATCH v4 " Elijah Newren via GitGitGadget
                         ` (10 preceding siblings ...)
  2020-11-05  0:22       ` [PATCH v4 11/13] strmap: enable allocations to come from a mem_pool Elijah Newren via GitGitGadget
@ 2020-11-05  0:22       ` Elijah Newren via GitGitGadget
  2020-11-05  0:22       ` [PATCH v4 13/13] Use new HASHMAP_INIT macro to simplify hashmap initialization Elijah Newren via GitGitGadget
                         ` (2 subsequent siblings)
  14 siblings, 0 replies; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-05  0:22 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

By default, we do not use a mempool and strdup_strings is true; in this
case, we can avoid both an extra allocation and an extra free by just
over-allocating for the strmap_entry leaving enough space at the end to
copy the key.  FLEXPTR_ALLOC_STR exists for exactly this purpose, so
make use of it.

Also, adjust the case when we are using a memory pool and strdup_strings
is true to just do one allocation from the memory pool instead of two so
that the strmap_clear() and strmap_remove() code can just avoid freeing
the key in all cases.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 strmap.c | 48 +++++++++++++++++++++++++-----------------------
 strmap.h |  1 +
 2 files changed, 26 insertions(+), 23 deletions(-)

diff --git a/strmap.c b/strmap.c
index f5904138e1..98513f7d58 100644
--- a/strmap.c
+++ b/strmap.c
@@ -59,11 +59,8 @@ static void strmap_free_entries_(struct strmap *map, int free_values)
 	hashmap_for_each_entry(&map->map, &iter, e, ent) {
 		if (free_values)
 			free(e->value);
-		if (!map->pool) {
-			if (map->strdup_strings)
-				free((char*)e->key);
+		if (!map->pool)
 			free(e);
-		}
 	}
 }
 
@@ -82,26 +79,34 @@ void strmap_partial_clear(struct strmap *map, int free_values)
 void *strmap_put(struct strmap *map, const char *str, void *data)
 {
 	struct strmap_entry *entry = find_strmap_entry(map, str);
-	void *old = NULL;
 
 	if (entry) {
-		old = entry->value;
+		void *old = entry->value;
 		entry->value = data;
-	} else {
-		const char *key = str;
-
-		entry = map->pool ? mem_pool_alloc(map->pool, sizeof(*entry))
-				  : xmalloc(sizeof(*entry));
-		hashmap_entry_init(&entry->ent, strhash(str));
+		return old;
+	}
 
-		if (map->strdup_strings)
-			key = map->pool ? mem_pool_strdup(map->pool, str)
-					: xstrdup(str);
-		entry->key = key;
-		entry->value = data;
-		hashmap_add(&map->map, &entry->ent);
+	if (map->strdup_strings) {
+		if (!map->pool) {
+			FLEXPTR_ALLOC_STR(entry, key, str);
+		} else {
+			size_t len = st_add(strlen(str), 1); /* include NUL */
+			entry = mem_pool_alloc(map->pool,
+					       st_add(sizeof(*entry), len));
+			memcpy(entry + 1, str, len);
+			entry->key = (void *)(entry + 1);
+		}
+	} else if (!map->pool) {
+		entry = xmalloc(sizeof(*entry));
+	} else {
+		entry = mem_pool_alloc(map->pool, sizeof(*entry));
 	}
-	return old;
+	hashmap_entry_init(&entry->ent, strhash(str));
+	if (!map->strdup_strings)
+		entry->key = str;
+	entry->value = data;
+	hashmap_add(&map->map, &entry->ent);
+	return NULL;
 }
 
 struct strmap_entry *strmap_get_entry(struct strmap *map, const char *str)
@@ -130,11 +135,8 @@ void strmap_remove(struct strmap *map, const char *str, int free_value)
 		return;
 	if (free_value)
 		free(ret->value);
-	if (!map->pool) {
-		if (map->strdup_strings)
-			free((char*)ret->key);
+	if (!map->pool)
 		free(ret);
-	}
 }
 
 void strintmap_incr(struct strintmap *map, const char *str, intptr_t amt)
diff --git a/strmap.h b/strmap.h
index b9d882a2d0..d210da5904 100644
--- a/strmap.h
+++ b/strmap.h
@@ -14,6 +14,7 @@ struct strmap_entry {
 	struct hashmap_entry ent;
 	const char *key;
 	void *value;
+	/* strmap_entry may be allocated extra space to store the key at end */
 };
 
 int cmp_strmap_entry(const void *hashmap_cmp_fn_data,
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v4 13/13] Use new HASHMAP_INIT macro to simplify hashmap initialization
  2020-11-05  0:22     ` [PATCH v4 " Elijah Newren via GitGitGadget
                         ` (11 preceding siblings ...)
  2020-11-05  0:22       ` [PATCH v4 12/13] strmap: take advantage of FLEXPTR_ALLOC_STR when relevant Elijah Newren via GitGitGadget
@ 2020-11-05  0:22       ` Elijah Newren via GitGitGadget
  2020-11-05 13:29       ` [PATCH v4 00/13] Add struct strmap and associated utility functions Jeff King
  2020-11-06  0:24       ` [PATCH v5 00/15] " Elijah Newren via GitGitGadget
  14 siblings, 0 replies; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-05  0:22 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Now that hashamp has lazy initialization and a HASHMAP_INIT macro,
hashmaps allocated on the stack can be initialized without a call to
hashmap_init() and in some cases makes the code a bit shorter.  Convert
some callsites over to take advantage of this.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 attr.c                  | 26 ++++++++------------------
 bloom.c                 |  3 +--
 builtin/difftool.c      |  9 ++++-----
 range-diff.c            |  4 +---
 revision.c              |  9 +--------
 t/helper/test-hashmap.c |  3 +--
 6 files changed, 16 insertions(+), 38 deletions(-)

diff --git a/attr.c b/attr.c
index a826b2ef1f..4ef85d668b 100644
--- a/attr.c
+++ b/attr.c
@@ -52,13 +52,6 @@ static inline void hashmap_unlock(struct attr_hashmap *map)
 	pthread_mutex_unlock(&map->mutex);
 }
 
-/*
- * The global dictionary of all interned attributes.  This
- * is a singleton object which is shared between threads.
- * Access to this dictionary must be surrounded with a mutex.
- */
-static struct attr_hashmap g_attr_hashmap;
-
 /* The container for objects stored in "struct attr_hashmap" */
 struct attr_hash_entry {
 	struct hashmap_entry ent;
@@ -80,11 +73,14 @@ static int attr_hash_entry_cmp(const void *unused_cmp_data,
 	return (a->keylen != b->keylen) || strncmp(a->key, b->key, a->keylen);
 }
 
-/* Initialize an 'attr_hashmap' object */
-static void attr_hashmap_init(struct attr_hashmap *map)
-{
-	hashmap_init(&map->map, attr_hash_entry_cmp, NULL, 0);
-}
+/*
+ * The global dictionary of all interned attributes.  This
+ * is a singleton object which is shared between threads.
+ * Access to this dictionary must be surrounded with a mutex.
+ */
+static struct attr_hashmap g_attr_hashmap = {
+	HASHMAP_INIT(attr_hash_entry_cmp, NULL)
+};
 
 /*
  * Retrieve the 'value' stored in a hashmap given the provided 'key'.
@@ -96,9 +92,6 @@ static void *attr_hashmap_get(struct attr_hashmap *map,
 	struct attr_hash_entry k;
 	struct attr_hash_entry *e;
 
-	if (!map->map.tablesize)
-		attr_hashmap_init(map);
-
 	hashmap_entry_init(&k.ent, memhash(key, keylen));
 	k.key = key;
 	k.keylen = keylen;
@@ -114,9 +107,6 @@ static void attr_hashmap_add(struct attr_hashmap *map,
 {
 	struct attr_hash_entry *e;
 
-	if (!map->map.tablesize)
-		attr_hashmap_init(map);
-
 	e = xmalloc(sizeof(struct attr_hash_entry));
 	hashmap_entry_init(&e->ent, memhash(key, keylen));
 	e->key = key;
diff --git a/bloom.c b/bloom.c
index 719c313a1c..b176f28f53 100644
--- a/bloom.c
+++ b/bloom.c
@@ -229,10 +229,9 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r,
 	diffcore_std(&diffopt);
 
 	if (diff_queued_diff.nr <= settings->max_changed_paths) {
-		struct hashmap pathmap;
+		struct hashmap pathmap = HASHMAP_INIT(pathmap_cmp, NULL);
 		struct pathmap_hash_entry *e;
 		struct hashmap_iter iter;
-		hashmap_init(&pathmap, pathmap_cmp, NULL, 0);
 
 		for (i = 0; i < diff_queued_diff.nr; i++) {
 			const char *path = diff_queued_diff.queue[i]->two->path;
diff --git a/builtin/difftool.c b/builtin/difftool.c
index 7ac432b881..6e18e623fd 100644
--- a/builtin/difftool.c
+++ b/builtin/difftool.c
@@ -342,7 +342,10 @@ static int run_dir_diff(const char *extcmd, int symlinks, const char *prefix,
 	const char *workdir, *tmp;
 	int ret = 0, i;
 	FILE *fp;
-	struct hashmap working_tree_dups, submodules, symlinks2;
+	struct hashmap working_tree_dups = HASHMAP_INIT(working_tree_entry_cmp,
+							NULL);
+	struct hashmap submodules = HASHMAP_INIT(pair_cmp, NULL);
+	struct hashmap symlinks2 = HASHMAP_INIT(pair_cmp, NULL);
 	struct hashmap_iter iter;
 	struct pair_entry *entry;
 	struct index_state wtindex;
@@ -383,10 +386,6 @@ static int run_dir_diff(const char *extcmd, int symlinks, const char *prefix,
 	rdir_len = rdir.len;
 	wtdir_len = wtdir.len;
 
-	hashmap_init(&working_tree_dups, working_tree_entry_cmp, NULL, 0);
-	hashmap_init(&submodules, pair_cmp, NULL, 0);
-	hashmap_init(&symlinks2, pair_cmp, NULL, 0);
-
 	child.no_stdin = 1;
 	child.git_cmd = 1;
 	child.use_shell = 0;
diff --git a/range-diff.c b/range-diff.c
index befeecae44..b9950f10c8 100644
--- a/range-diff.c
+++ b/range-diff.c
@@ -232,11 +232,9 @@ static int patch_util_cmp(const void *dummy, const struct patch_util *a,
 
 static void find_exact_matches(struct string_list *a, struct string_list *b)
 {
-	struct hashmap map;
+	struct hashmap map = HASHMAP_INIT((hashmap_cmp_fn)patch_util_cmp, NULL);
 	int i;
 
-	hashmap_init(&map, (hashmap_cmp_fn)patch_util_cmp, NULL, 0);
-
 	/* First, add the patches of a to a hash map */
 	for (i = 0; i < a->nr; i++) {
 		struct patch_util *util = a->items[i].util;
diff --git a/revision.c b/revision.c
index f27649d45d..c6e169e3eb 100644
--- a/revision.c
+++ b/revision.c
@@ -124,11 +124,6 @@ static int path_and_oids_cmp(const void *hashmap_cmp_fn_data,
 	return strcmp(e1->path, e2->path);
 }
 
-static void paths_and_oids_init(struct hashmap *map)
-{
-	hashmap_init(map, path_and_oids_cmp, NULL, 0);
-}
-
 static void paths_and_oids_clear(struct hashmap *map)
 {
 	struct hashmap_iter iter;
@@ -213,7 +208,7 @@ void mark_trees_uninteresting_sparse(struct repository *r,
 				     struct oidset *trees)
 {
 	unsigned has_interesting = 0, has_uninteresting = 0;
-	struct hashmap map;
+	struct hashmap map = HASHMAP_INIT(path_and_oids_cmp, NULL);
 	struct hashmap_iter map_iter;
 	struct path_and_oids_entry *entry;
 	struct object_id *oid;
@@ -237,8 +232,6 @@ void mark_trees_uninteresting_sparse(struct repository *r,
 	if (!has_uninteresting || !has_interesting)
 		return;
 
-	paths_and_oids_init(&map);
-
 	oidset_iter_init(trees, &iter);
 	while ((oid = oidset_iter_next(&iter))) {
 		struct tree *tree = lookup_tree(r, oid);
diff --git a/t/helper/test-hashmap.c b/t/helper/test-hashmap.c
index 2475663b49..36ff07bd4b 100644
--- a/t/helper/test-hashmap.c
+++ b/t/helper/test-hashmap.c
@@ -151,12 +151,11 @@ static void perf_hashmap(unsigned int method, unsigned int rounds)
 int cmd__hashmap(int argc, const char **argv)
 {
 	struct strbuf line = STRBUF_INIT;
-	struct hashmap map;
 	int icase;
+	struct hashmap map = HASHMAP_INIT(test_entry_cmp, &icase);
 
 	/* init hash map */
 	icase = argc > 1 && !strcmp("ignorecase", argv[1]);
-	hashmap_init(&map, test_entry_cmp, &icase, 0);
 
 	/* process commands from stdin */
 	while (strbuf_getline(&line, stdin) != EOF) {
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v4 00/13] Add struct strmap and associated utility functions
  2020-11-05  0:22     ` [PATCH v4 " Elijah Newren via GitGitGadget
                         ` (12 preceding siblings ...)
  2020-11-05  0:22       ` [PATCH v4 13/13] Use new HASHMAP_INIT macro to simplify hashmap initialization Elijah Newren via GitGitGadget
@ 2020-11-05 13:29       ` Jeff King
  2020-11-05 20:25         ` Junio C Hamano
  2020-11-06  0:24       ` [PATCH v5 00/15] " Elijah Newren via GitGitGadget
  14 siblings, 1 reply; 144+ messages in thread
From: Jeff King @ 2020-11-05 13:29 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget; +Cc: git, Elijah Newren

On Thu, Nov 05, 2020 at 12:22:32AM +0000, Elijah Newren via GitGitGadget wrote:

> Changes since v3 (almost all of which were suggestions from Peff):
> 
>  * Fix pointer math due to platform differences in FLEX_ALLOC definition,
>    and a few other FLEXPTR_ALLOC_STR cleanups
>  * Define strmap_for_each_entry in terms of hashmap_for_each_entry instead
>    of lower level functions
>  * Use simpler _INIT macros
>  * Remove strset_check_and_add() from API as per Peff's suggestion
>    (merge-ort doesn't need it; we can add it later)
>  * Update comments and commit messages to update now obsolete statements due
>    to changes from earlier reviews

Thanks, this version looks good to me.

I think we might as well do this on top now:

-- >8 --
Subject: [PATCH] shortlog: drop custom strset implementation

We can use the strset recently added in strmap.h instead. Note that this
doesn't have a "check_and_add" function. We can easily write the same
thing using separate "contains" and "adds" calls. This is slightly less
efficient, in that it hashes the string twice, but for our use here it
shouldn't be a big deal either way.

I did leave it as a separate helper function, though, since we use it in
three separate spots (some of which are in the middle of a conditional).

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/shortlog.c | 59 ++++++----------------------------------------
 1 file changed, 7 insertions(+), 52 deletions(-)

diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index 83f0a739b4..2d036ceec2 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -10,6 +10,7 @@
 #include "shortlog.h"
 #include "parse-options.h"
 #include "trailer.h"
+#include "strmap.h"
 
 static char const * const shortlog_usage[] = {
 	N_("git shortlog [<options>] [<revision-range>] [[--] <path>...]"),
@@ -169,60 +170,14 @@ static void read_from_stdin(struct shortlog *log)
 	strbuf_release(&oneline);
 }
 
-struct strset_item {
-	struct hashmap_entry ent;
-	char value[FLEX_ARRAY];
-};
-
-struct strset {
-	struct hashmap map;
-};
-
-#define STRSET_INIT { { NULL } }
-
-static int strset_item_hashcmp(const void *hash_data,
-			       const struct hashmap_entry *entry,
-			       const struct hashmap_entry *entry_or_key,
-			       const void *keydata)
+static int check_dup(struct strset *dups, const char *str)
 {
-	const struct strset_item *a, *b;
-
-	a = container_of(entry, const struct strset_item, ent);
-	if (keydata)
-		return strcmp(a->value, keydata);
-
-	b = container_of(entry_or_key, const struct strset_item, ent);
-	return strcmp(a->value, b->value);
-}
-
-/*
- * Adds "str" to the set if it was not already present; returns true if it was
- * already there.
- */
-static int strset_check_and_add(struct strset *ss, const char *str)
-{
-	unsigned int hash = strhash(str);
-	struct strset_item *item;
-
-	if (!ss->map.table)
-		hashmap_init(&ss->map, strset_item_hashcmp, NULL, 0);
-
-	if (hashmap_get_from_hash(&ss->map, hash, str))
+	if (strset_contains(dups, str))
 		return 1;
-
-	FLEX_ALLOC_STR(item, value, str);
-	hashmap_entry_init(&item->ent, hash);
-	hashmap_add(&ss->map, &item->ent);
+	strset_add(dups, str);
 	return 0;
 }
 
-static void strset_clear(struct strset *ss)
-{
-	if (!ss->map.table)
-		return;
-	hashmap_clear_and_free(&ss->map, struct strset_item, ent);
-}
-
 static void insert_records_from_trailers(struct shortlog *log,
 					 struct strset *dups,
 					 struct commit *commit,
@@ -253,7 +208,7 @@ static void insert_records_from_trailers(struct shortlog *log,
 		if (!parse_ident(log, &ident, value))
 			value = ident.buf;
 
-		if (strset_check_and_add(dups, value))
+		if (check_dup(dups, value))
 			continue;
 		insert_one_record(log, value, oneline);
 	}
@@ -291,7 +246,7 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 				      log->email ? "%aN <%aE>" : "%aN",
 				      &ident, &ctx);
 		if (!HAS_MULTI_BITS(log->groups) ||
-		    !strset_check_and_add(&dups, ident.buf))
+		    !check_dup(&dups, ident.buf))
 			insert_one_record(log, ident.buf, oneline_str);
 	}
 	if (log->groups & SHORTLOG_GROUP_COMMITTER) {
@@ -300,7 +255,7 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 				      log->email ? "%cN <%cE>" : "%cN",
 				      &ident, &ctx);
 		if (!HAS_MULTI_BITS(log->groups) ||
-		    !strset_check_and_add(&dups, ident.buf))
+		    !check_dup(&dups, ident.buf))
 			insert_one_record(log, ident.buf, oneline_str);
 	}
 	if (log->groups & SHORTLOG_GROUP_TRAILER) {
-- 
2.29.2.575.gb51baa09ad


^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v4 00/13] Add struct strmap and associated utility functions
  2020-11-05 13:29       ` [PATCH v4 00/13] Add struct strmap and associated utility functions Jeff King
@ 2020-11-05 20:25         ` Junio C Hamano
  2020-11-05 21:17           ` Jeff King
  2020-11-05 21:22           ` Elijah Newren
  0 siblings, 2 replies; 144+ messages in thread
From: Junio C Hamano @ 2020-11-05 20:25 UTC (permalink / raw)
  To: Jeff King; +Cc: Elijah Newren via GitGitGadget, git, Elijah Newren

Jeff King <peff@peff.net> writes:

> Subject: [PATCH] shortlog: drop custom strset implementation
>
> We can use the strset recently added in strmap.h instead. Note that this
> doesn't have a "check_and_add" function. We can easily write the same
> thing using separate "contains" and "adds" calls. This is slightly less
> efficient, in that it hashes the string twice, but for our use here it
> shouldn't be a big deal either way.
>
> I did leave it as a separate helper function, though, since we use it in
> three separate spots (some of which are in the middle of a conditional).

It makes sense, but check_dup() sounds like its use pattern would be

	if (check_dup(it) == NO_DUP)
		add(it);

where it is more like "add it but just once".

By the way, is a strset a set or a bag?  If it makes the second add
an no-op, perhaps your check_dup() is what strset_add() should do
itself?  What builtin/shortlog.c::check_dup() does smells like it is
a workaround for the lack of a naturally-expected feature.


^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v4 00/13] Add struct strmap and associated utility functions
  2020-11-05 20:25         ` Junio C Hamano
@ 2020-11-05 21:17           ` Jeff King
  2020-11-05 21:22           ` Elijah Newren
  1 sibling, 0 replies; 144+ messages in thread
From: Jeff King @ 2020-11-05 21:17 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Elijah Newren via GitGitGadget, git, Elijah Newren

On Thu, Nov 05, 2020 at 12:25:14PM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > Subject: [PATCH] shortlog: drop custom strset implementation
> >
> > We can use the strset recently added in strmap.h instead. Note that this
> > doesn't have a "check_and_add" function. We can easily write the same
> > thing using separate "contains" and "adds" calls. This is slightly less
> > efficient, in that it hashes the string twice, but for our use here it
> > shouldn't be a big deal either way.
> >
> > I did leave it as a separate helper function, though, since we use it in
> > three separate spots (some of which are in the middle of a conditional).
> 
> It makes sense, but check_dup() sounds like its use pattern would be
> 
> 	if (check_dup(it) == NO_DUP)
> 		add(it);
> 
> where it is more like "add it but just once".

Hmph. I picked that name (versus just "contains") hoping it be general
enough to cover the dual operation.  Better name suggestions are
welcome. Though...

> By the way, is a strset a set or a bag?  If it makes the second add
> an no-op, perhaps your check_dup() is what strset_add() should do
> itself?  What builtin/shortlog.c::check_dup() does smells like it is
> a workaround for the lack of a naturally-expected feature.

Yes, if strset_add() returned an integer telling us whether the item was
already in the set, then we could use it directly. It's slightly
non-trivial to do, though, as it's built around strmap_put(), which
returns a pointer to the old value. But since we're a set and not a map,
that value is always NULL; we can't tell the difference between "I was
storing an old value which was NULL" and "I was not storing any value".

If strset were built around strintmap it could store "1" for "present in
the set". It somehow feels hacky, though, to induce extra value writes
just for the sake of working around the API.

Since strset is defined within strmap.c, perhaps it wouldn't be too bad
for it to be more intimate with the details here. I.e., to use
find_strmap_entry() directly, and if the value is not present, to create
a new hashmap entry. That would require hacking up strmap_put() into a
few helpers, but it's probably not too bad.

-Peff

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v4 00/13] Add struct strmap and associated utility functions
  2020-11-05 20:25         ` Junio C Hamano
  2020-11-05 21:17           ` Jeff King
@ 2020-11-05 21:22           ` Elijah Newren
  2020-11-05 22:15             ` Junio C Hamano
  1 sibling, 1 reply; 144+ messages in thread
From: Elijah Newren @ 2020-11-05 21:22 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jeff King, Elijah Newren via GitGitGadget, Git Mailing List

On Thu, Nov 5, 2020 at 12:25 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Jeff King <peff@peff.net> writes:
>
> > Subject: [PATCH] shortlog: drop custom strset implementation
> >
> > We can use the strset recently added in strmap.h instead. Note that this
> > doesn't have a "check_and_add" function. We can easily write the same
> > thing using separate "contains" and "adds" calls. This is slightly less
> > efficient, in that it hashes the string twice, but for our use here it
> > shouldn't be a big deal either way.
> >
> > I did leave it as a separate helper function, though, since we use it in
> > three separate spots (some of which are in the middle of a conditional).
>
> It makes sense, but check_dup() sounds like its use pattern would be
>
>         if (check_dup(it) == NO_DUP)
>                 add(it);
>
> where it is more like "add it but just once".
>
> By the way, is a strset a set or a bag?  If it makes the second add

strset is a set; there is no way to get duplicate entries.

> an no-op, perhaps your check_dup() is what strset_add() should do
> itself?  What builtin/shortlog.c::check_dup() does smells like it is
> a workaround for the lack of a naturally-expected feature.

Is the expectation that strset_add() would return a boolean for
whether a new entry was added?

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v4 00/13] Add struct strmap and associated utility functions
  2020-11-05 21:22           ` Elijah Newren
@ 2020-11-05 22:15             ` Junio C Hamano
  0 siblings, 0 replies; 144+ messages in thread
From: Junio C Hamano @ 2020-11-05 22:15 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Jeff King, Elijah Newren via GitGitGadget, Git Mailing List

Elijah Newren <newren@gmail.com> writes:

>> It makes sense, but check_dup() sounds like its use pattern would be
>>
>>         if (check_dup(it) == NO_DUP)
>>                 add(it);
>>
>> where it is more like "add it but just once".
>>
>> By the way, is a strset a set or a bag?  If it makes the second add
>
> strset is a set; there is no way to get duplicate entries.
>
>> an no-op, perhaps your check_dup() is what strset_add() should do
>> itself?  What builtin/shortlog.c::check_dup() does smells like it is
>> a workaround for the lack of a naturally-expected feature.
>
> Is the expectation that strset_add() would return a boolean for
> whether a new entry was added?

It seems to be a reasonable expectation that the caller can tell if
the add was "already there and was a no-op", judging from what we
saw in the shortlog code, which was the first audience the API was
introduced to support.  It seems to benefit from it if it were
available, and has to work around the lack of it with check_dup()
wrapper.



^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v5 00/15] Add struct strmap and associated utility functions
  2020-11-05  0:22     ` [PATCH v4 " Elijah Newren via GitGitGadget
                         ` (13 preceding siblings ...)
  2020-11-05 13:29       ` [PATCH v4 00/13] Add struct strmap and associated utility functions Jeff King
@ 2020-11-06  0:24       ` Elijah Newren via GitGitGadget
  2020-11-06  0:24         ` [PATCH v5 01/15] hashmap: add usage documentation explaining hashmap_free[_entries]() Elijah Newren via GitGitGadget
                           ` (16 more replies)
  14 siblings, 17 replies; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-06  0:24 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren

Here I introduce new strmap, strintmap, and strset types. This strmap type
was based on Peff's proposal from a couple years ago[1], but has additions
that I made as I used it, and a number of additions/changes suggested by
Peff in his reviews (and Junio in his). I also start the series off with
some changes to hashmap, based on Peff's feedback on v1 & v2.

NOTE: While en/merge-ort-impl depends on this series, there are no changes
in v5 that affect it so en/merge-ort-impl does not need a reroll.

Changes since v4:

 * Make strset_add() return a boolean -- 1 if it added the value to the set,
   0 if the value was already in the set.
 * Add a preparatory patch to the above which adds a create_entry() helper
   function so that strset_add() can bypass strmap_put().
 * Add a patch which updates shortlog to use the new strset API.

[1] 
https://lore.kernel.org/git/20180906191203.GA26184@sigill.intra.peff.net/

Elijah Newren (15):
  hashmap: add usage documentation explaining hashmap_free[_entries]()
  hashmap: adjust spacing to fix argument alignment
  hashmap: allow re-use after hashmap_free()
  hashmap: introduce a new hashmap_partial_clear()
  hashmap: provide deallocation function names
  strmap: new utility functions
  strmap: add more utility functions
  strmap: enable faster clearing and reusing of strmaps
  strmap: add functions facilitating use as a string->int map
  strmap: split create_entry() out of strmap_put()
  strmap: add a strset sub-type
  strmap: enable allocations to come from a mem_pool
  strmap: take advantage of FLEXPTR_ALLOC_STR when relevant
  Use new HASHMAP_INIT macro to simplify hashmap initialization
  shortlog: use strset from strmap.h

 Makefile                |   1 +
 add-interactive.c       |   2 +-
 attr.c                  |  26 ++--
 blame.c                 |   2 +-
 bloom.c                 |   5 +-
 builtin/difftool.c      |   9 +-
 builtin/fetch.c         |   6 +-
 builtin/shortlog.c      |  61 +--------
 config.c                |   2 +-
 diff.c                  |   4 +-
 diffcore-rename.c       |   2 +-
 dir.c                   |   8 +-
 hashmap.c               |  74 +++++++----
 hashmap.h               |  91 +++++++++++---
 merge-recursive.c       |   6 +-
 name-hash.c             |   4 +-
 object.c                |   2 +-
 oidmap.c                |   2 +-
 patch-ids.c             |   2 +-
 range-diff.c            |   6 +-
 ref-filter.c            |   2 +-
 revision.c              |  11 +-
 sequencer.c             |   4 +-
 strmap.c                | 178 ++++++++++++++++++++++++++
 strmap.h                | 268 ++++++++++++++++++++++++++++++++++++++++
 submodule-config.c      |   4 +-
 t/helper/test-hashmap.c |   9 +-
 27 files changed, 621 insertions(+), 170 deletions(-)
 create mode 100644 strmap.c
 create mode 100644 strmap.h


base-commit: d4a392452e292ff924e79ec8458611c0f679d6d4
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-835%2Fnewren%2Fstrmap-v5
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-835/newren/strmap-v5
Pull-Request: https://github.com/git/git/pull/835

Range-diff vs v4:

  1:  af6b6fcb46 =  1:  af6b6fcb46 hashmap: add usage documentation explaining hashmap_free[_entries]()
  2:  591161fd78 =  2:  591161fd78 hashmap: adjust spacing to fix argument alignment
  3:  f2718d036d =  3:  f2718d036d hashmap: allow re-use after hashmap_free()
  4:  61f1da3c51 =  4:  61f1da3c51 hashmap: introduce a new hashmap_partial_clear()
  5:  861e8d65ae =  5:  861e8d65ae hashmap: provide deallocation function names
  6:  448d3b219f =  6:  448d3b219f strmap: new utility functions
  7:  5e8004c728 =  7:  5e8004c728 strmap: add more utility functions
  8:  fd96e9fc8d =  8:  fd96e9fc8d strmap: enable faster clearing and reusing of strmaps
  9:  f499934f54 =  9:  f499934f54 strmap: add functions facilitating use as a string->int map
  -:  ---------- > 10:  3bcceb8cdb strmap: split create_entry() out of strmap_put()
 10:  ee1ec55f1b ! 11:  e128a71fec strmap: add a strset sub-type
     @@ Commit message
      
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
     + ## strmap.c ##
     +@@ strmap.c: void strintmap_incr(struct strintmap *map, const char *str, intptr_t amt)
     + 	else
     + 		strintmap_set(map, str, map->default_value + amt);
     + }
     ++
     ++int strset_add(struct strset *set, const char *str)
     ++{
     ++	/*
     ++	 * Cannot use strmap_put() because it'll return NULL in both cases:
     ++	 *   - cannot find str: NULL means "not found"
     ++	 *   - does find str: NULL is the value associated with str
     ++	 */
     ++	struct strmap_entry *entry = find_strmap_entry(&set->map, str);
     ++
     ++	if (entry)
     ++		return 0;
     ++
     ++	entry = create_entry(&set->map, str, NULL);
     ++	hashmap_add(&set->map.map, &entry->ent);
     ++	return 1;
     ++}
     +
       ## strmap.h ##
      @@ strmap.h: int cmp_strmap_entry(const void *hashmap_cmp_fn_data,
       			.map = STRMAP_INIT,   \
     @@ strmap.h: static inline void strintmap_set(struct strintmap *map, const char *st
      +	return strmap_get_size(&set->map);
      +}
      +
     -+static inline void strset_add(struct strset *set, const char *str)
     -+{
     -+	strmap_put(&set->map, str, NULL);
     -+}
     ++/* Returns 1 if str is added to the set; returns 0 if str was already in set */
     ++int strset_add(struct strset *set, const char *str);
      +
       #endif /* STRMAP_H */
 11:  73a57045c3 ! 12:  34f542d9dd strmap: enable allocations to come from a mem_pool
     @@ strmap.c: static void strmap_free_entries_(struct strmap *map, int free_values)
       	}
       }
       
     -@@ strmap.c: void *strmap_put(struct strmap *map, const char *str, void *data)
     - 	} else {
     - 		const char *key = str;
     - 
     --		entry = xmalloc(sizeof(*entry));
     -+		entry = map->pool ? mem_pool_alloc(map->pool, sizeof(*entry))
     -+				  : xmalloc(sizeof(*entry));
     - 		hashmap_entry_init(&entry->ent, strhash(str));
     - 
     - 		if (map->strdup_strings)
     --			key = xstrdup(str);
     -+			key = map->pool ? mem_pool_strdup(map->pool, str)
     -+					: xstrdup(str);
     - 		entry->key = key;
     - 		entry->value = data;
     - 		hashmap_add(&map->map, &entry->ent);
     +@@ strmap.c: static struct strmap_entry *create_entry(struct strmap *map,
     + 	struct strmap_entry *entry;
     + 	const char *key = str;
     + 
     +-	entry = xmalloc(sizeof(*entry));
     ++	entry = map->pool ? mem_pool_alloc(map->pool, sizeof(*entry))
     ++			  : xmalloc(sizeof(*entry));
     + 	hashmap_entry_init(&entry->ent, strhash(str));
     + 
     + 	if (map->strdup_strings)
     +-		key = xstrdup(str);
     ++		key = map->pool ? mem_pool_strdup(map->pool, str)
     ++				: xstrdup(str);
     + 	entry->key = key;
     + 	entry->value = data;
     + 	return entry;
      @@ strmap.c: void strmap_remove(struct strmap *map, const char *str, int free_value)
       		return;
       	if (free_value)
 12:  0352260de4 ! 13:  39ec2fa411 strmap: take advantage of FLEXPTR_ALLOC_STR when relevant
     @@ strmap.c: static void strmap_free_entries_(struct strmap *map, int free_values)
       	}
       }
       
     -@@ strmap.c: void strmap_partial_clear(struct strmap *map, int free_values)
     - void *strmap_put(struct strmap *map, const char *str, void *data)
     +@@ strmap.c: static struct strmap_entry *create_entry(struct strmap *map,
     + 					 void *data)
       {
     - 	struct strmap_entry *entry = find_strmap_entry(map, str);
     --	void *old = NULL;
     + 	struct strmap_entry *entry;
     +-	const char *key = str;
       
     - 	if (entry) {
     --		old = entry->value;
     -+		void *old = entry->value;
     - 		entry->value = data;
     --	} else {
     --		const char *key = str;
     --
     --		entry = map->pool ? mem_pool_alloc(map->pool, sizeof(*entry))
     --				  : xmalloc(sizeof(*entry));
     --		hashmap_entry_init(&entry->ent, strhash(str));
     -+		return old;
     -+	}
     - 
     --		if (map->strdup_strings)
     --			key = map->pool ? mem_pool_strdup(map->pool, str)
     --					: xstrdup(str);
     --		entry->key = key;
     --		entry->value = data;
     --		hashmap_add(&map->map, &entry->ent);
     +-	entry = map->pool ? mem_pool_alloc(map->pool, sizeof(*entry))
     +-			  : xmalloc(sizeof(*entry));
      +	if (map->strdup_strings) {
      +		if (!map->pool) {
      +			FLEXPTR_ALLOC_STR(entry, key, str);
     @@ strmap.c: void strmap_partial_clear(struct strmap *map, int free_values)
      +		entry = xmalloc(sizeof(*entry));
      +	} else {
      +		entry = mem_pool_alloc(map->pool, sizeof(*entry));
     - 	}
     --	return old;
     -+	hashmap_entry_init(&entry->ent, strhash(str));
     ++	}
     + 	hashmap_entry_init(&entry->ent, strhash(str));
     +-
     +-	if (map->strdup_strings)
     +-		key = map->pool ? mem_pool_strdup(map->pool, str)
     +-				: xstrdup(str);
     +-	entry->key = key;
      +	if (!map->strdup_strings)
      +		entry->key = str;
     -+	entry->value = data;
     -+	hashmap_add(&map->map, &entry->ent);
     -+	return NULL;
     + 	entry->value = data;
     + 	return entry;
       }
     - 
     - struct strmap_entry *strmap_get_entry(struct strmap *map, const char *str)
      @@ strmap.c: void strmap_remove(struct strmap *map, const char *str, int free_value)
       		return;
       	if (free_value)
 13:  617926540b = 14:  d3713d88f2 Use new HASHMAP_INIT macro to simplify hashmap initialization
  -:  ---------- > 15:  24e5ce60f5 shortlog: use strset from strmap.h

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v5 01/15] hashmap: add usage documentation explaining hashmap_free[_entries]()
  2020-11-06  0:24       ` [PATCH v5 00/15] " Elijah Newren via GitGitGadget
@ 2020-11-06  0:24         ` Elijah Newren via GitGitGadget
  2020-11-06  0:24         ` [PATCH v5 02/15] hashmap: adjust spacing to fix argument alignment Elijah Newren via GitGitGadget
                           ` (15 subsequent siblings)
  16 siblings, 0 replies; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-06  0:24 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

The existence of hashmap_free() and hashmap_free_entries() confused me,
and the docs weren't clear enough.  We are dealing with a map table,
entries in that table, and possibly also things each of those entries
point to.  I had to consult other source code examples and the
implementation.  Add a brief note to clarify the differences.  This will
become even more important once we introduce a new
hashmap_partial_clear() function which will add the question of whether
the table itself has been freed.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 hashmap.h | 31 +++++++++++++++++++++++++++++--
 1 file changed, 29 insertions(+), 2 deletions(-)

diff --git a/hashmap.h b/hashmap.h
index b011b394fe..2994dc7a9c 100644
--- a/hashmap.h
+++ b/hashmap.h
@@ -236,13 +236,40 @@ void hashmap_init(struct hashmap *map,
 void hashmap_free_(struct hashmap *map, ssize_t offset);
 
 /*
- * Frees a hashmap structure and allocated memory, leaves entries undisturbed
+ * Frees a hashmap structure and allocated memory for the table, but does not
+ * free the entries nor anything they point to.
+ *
+ * Usage note:
+ *
+ * Many callers will need to iterate over all entries and free the data each
+ * entry points to; in such a case, they can free the entry itself while at it.
+ * Thus, you might see:
+ *
+ *    hashmap_for_each_entry(map, hashmap_iter, e, hashmap_entry_name) {
+ *      free(e->somefield);
+ *      free(e);
+ *    }
+ *    hashmap_free(map);
+ *
+ * instead of
+ *
+ *    hashmap_for_each_entry(map, hashmap_iter, e, hashmap_entry_name) {
+ *      free(e->somefield);
+ *    }
+ *    hashmap_free_entries(map, struct my_entry_struct, hashmap_entry_name);
+ *
+ * to avoid the implicit extra loop over the entries.  However, if there are
+ * no special fields in your entry that need to be freed beyond the entry
+ * itself, it is probably simpler to avoid the explicit loop and just call
+ * hashmap_free_entries().
  */
 #define hashmap_free(map) hashmap_free_(map, -1)
 
 /*
  * Frees @map and all entries.  @type is the struct type of the entry
- * where @member is the hashmap_entry struct used to associate with @map
+ * where @member is the hashmap_entry struct used to associate with @map.
+ *
+ * See usage note above hashmap_free().
  */
 #define hashmap_free_entries(map, type, member) \
 	hashmap_free_(map, offsetof(type, member));
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v5 02/15] hashmap: adjust spacing to fix argument alignment
  2020-11-06  0:24       ` [PATCH v5 00/15] " Elijah Newren via GitGitGadget
  2020-11-06  0:24         ` [PATCH v5 01/15] hashmap: add usage documentation explaining hashmap_free[_entries]() Elijah Newren via GitGitGadget
@ 2020-11-06  0:24         ` Elijah Newren via GitGitGadget
  2020-11-06  0:24         ` [PATCH v5 03/15] hashmap: allow re-use after hashmap_free() Elijah Newren via GitGitGadget
                           ` (14 subsequent siblings)
  16 siblings, 0 replies; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-06  0:24 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

No actual code changes; just whitespace adjustments.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 hashmap.c | 17 +++++++++--------
 hashmap.h | 22 +++++++++++-----------
 2 files changed, 20 insertions(+), 19 deletions(-)

diff --git a/hashmap.c b/hashmap.c
index 09813e1a46..e44d8a3e85 100644
--- a/hashmap.c
+++ b/hashmap.c
@@ -92,8 +92,9 @@ static void alloc_table(struct hashmap *map, unsigned int size)
 }
 
 static inline int entry_equals(const struct hashmap *map,
-		const struct hashmap_entry *e1, const struct hashmap_entry *e2,
-		const void *keydata)
+			       const struct hashmap_entry *e1,
+			       const struct hashmap_entry *e2,
+			       const void *keydata)
 {
 	return (e1 == e2) ||
 	       (e1->hash == e2->hash &&
@@ -101,7 +102,7 @@ static inline int entry_equals(const struct hashmap *map,
 }
 
 static inline unsigned int bucket(const struct hashmap *map,
-		const struct hashmap_entry *key)
+				  const struct hashmap_entry *key)
 {
 	return key->hash & (map->tablesize - 1);
 }
@@ -148,7 +149,7 @@ static int always_equal(const void *unused_cmp_data,
 }
 
 void hashmap_init(struct hashmap *map, hashmap_cmp_fn equals_function,
-		const void *cmpfn_data, size_t initial_size)
+		  const void *cmpfn_data, size_t initial_size)
 {
 	unsigned int size = HASHMAP_INITIAL_SIZE;
 
@@ -199,7 +200,7 @@ struct hashmap_entry *hashmap_get(const struct hashmap *map,
 }
 
 struct hashmap_entry *hashmap_get_next(const struct hashmap *map,
-			const struct hashmap_entry *entry)
+				       const struct hashmap_entry *entry)
 {
 	struct hashmap_entry *e = entry->next;
 	for (; e; e = e->next)
@@ -225,8 +226,8 @@ void hashmap_add(struct hashmap *map, struct hashmap_entry *entry)
 }
 
 struct hashmap_entry *hashmap_remove(struct hashmap *map,
-					const struct hashmap_entry *key,
-					const void *keydata)
+				     const struct hashmap_entry *key,
+				     const void *keydata)
 {
 	struct hashmap_entry *old;
 	struct hashmap_entry **e = find_entry_ptr(map, key, keydata);
@@ -249,7 +250,7 @@ struct hashmap_entry *hashmap_remove(struct hashmap *map,
 }
 
 struct hashmap_entry *hashmap_put(struct hashmap *map,
-				struct hashmap_entry *entry)
+				  struct hashmap_entry *entry)
 {
 	struct hashmap_entry *old = hashmap_remove(map, entry, NULL);
 	hashmap_add(map, entry);
diff --git a/hashmap.h b/hashmap.h
index 2994dc7a9c..904f61d6e1 100644
--- a/hashmap.h
+++ b/hashmap.h
@@ -228,9 +228,9 @@ struct hashmap {
  * prevent expensive resizing. If 0, the table is dynamically resized.
  */
 void hashmap_init(struct hashmap *map,
-			 hashmap_cmp_fn equals_function,
-			 const void *equals_function_data,
-			 size_t initial_size);
+		  hashmap_cmp_fn equals_function,
+		  const void *equals_function_data,
+		  size_t initial_size);
 
 /* internal function for freeing hashmap */
 void hashmap_free_(struct hashmap *map, ssize_t offset);
@@ -288,7 +288,7 @@ void hashmap_free_(struct hashmap *map, ssize_t offset);
  * and if it is on stack, you can just let it go out of scope).
  */
 static inline void hashmap_entry_init(struct hashmap_entry *e,
-					unsigned int hash)
+				      unsigned int hash)
 {
 	e->hash = hash;
 	e->next = NULL;
@@ -330,8 +330,8 @@ static inline unsigned int hashmap_get_size(struct hashmap *map)
  * to `hashmap_cmp_fn` to decide whether the entry matches the key.
  */
 struct hashmap_entry *hashmap_get(const struct hashmap *map,
-				const struct hashmap_entry *key,
-				const void *keydata);
+				  const struct hashmap_entry *key,
+				  const void *keydata);
 
 /*
  * Returns the hashmap entry for the specified hash code and key data,
@@ -364,7 +364,7 @@ static inline struct hashmap_entry *hashmap_get_from_hash(
  * call to `hashmap_get` or `hashmap_get_next`.
  */
 struct hashmap_entry *hashmap_get_next(const struct hashmap *map,
-			const struct hashmap_entry *entry);
+				       const struct hashmap_entry *entry);
 
 /*
  * Adds a hashmap entry. This allows to add duplicate entries (i.e.
@@ -384,7 +384,7 @@ void hashmap_add(struct hashmap *map, struct hashmap_entry *entry);
  * Returns the replaced entry, or NULL if not found (i.e. the entry was added).
  */
 struct hashmap_entry *hashmap_put(struct hashmap *map,
-				struct hashmap_entry *entry);
+				  struct hashmap_entry *entry);
 
 /*
  * Adds or replaces a hashmap entry contained within @keyvar,
@@ -406,8 +406,8 @@ struct hashmap_entry *hashmap_put(struct hashmap *map,
  * Argument explanation is the same as in `hashmap_get`.
  */
 struct hashmap_entry *hashmap_remove(struct hashmap *map,
-					const struct hashmap_entry *key,
-					const void *keydata);
+				     const struct hashmap_entry *key,
+				     const void *keydata);
 
 /*
  * Removes a hashmap entry contained within @keyvar,
@@ -449,7 +449,7 @@ struct hashmap_entry *hashmap_iter_next(struct hashmap_iter *iter);
 
 /* Initializes the iterator and returns the first entry, if any. */
 static inline struct hashmap_entry *hashmap_iter_first(struct hashmap *map,
-		struct hashmap_iter *iter)
+						       struct hashmap_iter *iter)
 {
 	hashmap_iter_init(map, iter);
 	return hashmap_iter_next(iter);
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v5 03/15] hashmap: allow re-use after hashmap_free()
  2020-11-06  0:24       ` [PATCH v5 00/15] " Elijah Newren via GitGitGadget
  2020-11-06  0:24         ` [PATCH v5 01/15] hashmap: add usage documentation explaining hashmap_free[_entries]() Elijah Newren via GitGitGadget
  2020-11-06  0:24         ` [PATCH v5 02/15] hashmap: adjust spacing to fix argument alignment Elijah Newren via GitGitGadget
@ 2020-11-06  0:24         ` Elijah Newren via GitGitGadget
  2020-11-06  0:24         ` [PATCH v5 04/15] hashmap: introduce a new hashmap_partial_clear() Elijah Newren via GitGitGadget
                           ` (13 subsequent siblings)
  16 siblings, 0 replies; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-06  0:24 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Previously, once map->table had been freed, any calls to hashmap_put(),
hashmap_get(), or hashmap_remove() would cause a NULL pointer
dereference (since hashmap_free_() also zeros the memory; without that
zeroing, calling these functions would cause a use-after-free problem).

Modify these functions to check for a NULL table and automatically
allocate as needed.

Also add a HASHMAP_INIT(fn, data) macro for initializing hashmaps on the
stack without calling hashmap_init().

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 hashmap.c | 16 ++++++++++++++--
 hashmap.h |  3 +++
 2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/hashmap.c b/hashmap.c
index e44d8a3e85..bb7c9979b8 100644
--- a/hashmap.c
+++ b/hashmap.c
@@ -114,6 +114,7 @@ int hashmap_bucket(const struct hashmap *map, unsigned int hash)
 
 static void rehash(struct hashmap *map, unsigned int newsize)
 {
+	/* map->table MUST NOT be NULL when this function is called */
 	unsigned int i, oldsize = map->tablesize;
 	struct hashmap_entry **oldtable = map->table;
 
@@ -134,6 +135,7 @@ static void rehash(struct hashmap *map, unsigned int newsize)
 static inline struct hashmap_entry **find_entry_ptr(const struct hashmap *map,
 		const struct hashmap_entry *key, const void *keydata)
 {
+	/* map->table MUST NOT be NULL when this function is called */
 	struct hashmap_entry **e = &map->table[bucket(map, key)];
 	while (*e && !entry_equals(map, *e, key, keydata))
 		e = &(*e)->next;
@@ -196,6 +198,8 @@ struct hashmap_entry *hashmap_get(const struct hashmap *map,
 				const struct hashmap_entry *key,
 				const void *keydata)
 {
+	if (!map->table)
+		return NULL;
 	return *find_entry_ptr(map, key, keydata);
 }
 
@@ -211,8 +215,12 @@ struct hashmap_entry *hashmap_get_next(const struct hashmap *map,
 
 void hashmap_add(struct hashmap *map, struct hashmap_entry *entry)
 {
-	unsigned int b = bucket(map, entry);
+	unsigned int b;
+
+	if (!map->table)
+		alloc_table(map, HASHMAP_INITIAL_SIZE);
 
+	b = bucket(map, entry);
 	/* add entry */
 	entry->next = map->table[b];
 	map->table[b] = entry;
@@ -230,7 +238,11 @@ struct hashmap_entry *hashmap_remove(struct hashmap *map,
 				     const void *keydata)
 {
 	struct hashmap_entry *old;
-	struct hashmap_entry **e = find_entry_ptr(map, key, keydata);
+	struct hashmap_entry **e;
+
+	if (!map->table)
+		return NULL;
+	e = find_entry_ptr(map, key, keydata);
 	if (!*e)
 		return NULL;
 
diff --git a/hashmap.h b/hashmap.h
index 904f61d6e1..3b0f2bcade 100644
--- a/hashmap.h
+++ b/hashmap.h
@@ -210,6 +210,9 @@ struct hashmap {
 
 /* hashmap functions */
 
+#define HASHMAP_INIT(fn, data) { .cmpfn = fn, .cmpfn_data = data, \
+				 .do_count_items = 1 }
+
 /*
  * Initializes a hashmap structure.
  *
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v5 04/15] hashmap: introduce a new hashmap_partial_clear()
  2020-11-06  0:24       ` [PATCH v5 00/15] " Elijah Newren via GitGitGadget
                           ` (2 preceding siblings ...)
  2020-11-06  0:24         ` [PATCH v5 03/15] hashmap: allow re-use after hashmap_free() Elijah Newren via GitGitGadget
@ 2020-11-06  0:24         ` Elijah Newren via GitGitGadget
  2020-11-06  0:24         ` [PATCH v5 05/15] hashmap: provide deallocation function names Elijah Newren via GitGitGadget
                           ` (12 subsequent siblings)
  16 siblings, 0 replies; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-06  0:24 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

merge-ort is a heavy user of strmaps, which are built on hashmap.[ch].
clear_or_reinit_internal_opts() in merge-ort was taking about 12% of
overall runtime in my testcase involving rebasing 35 patches of
linux.git across a big rename.  clear_or_reinit_internal_opts() was
calling hashmap_free() followed by hashmap_init(), meaning that not only
was it freeing all the memory associated with each of the strmaps just
to immediately allocate a new array again, it was allocating a new array
that was likely smaller than needed (thus resulting in later need to
rehash things).  The ending size of the map table on the previous commit
was likely almost perfectly sized for the next commit we wanted to pick,
and not dropping and reallocating the table immediately is a win.

Add some new API to hashmap to clear a hashmap of entries without
freeing map->table (and instead only zeroing it out like alloc_table()
would do, along with zeroing the count of items in the table and the
shrink_at field).

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 hashmap.c | 39 +++++++++++++++++++++++++++------------
 hashmap.h | 13 ++++++++++++-
 2 files changed, 39 insertions(+), 13 deletions(-)

diff --git a/hashmap.c b/hashmap.c
index bb7c9979b8..922ed07954 100644
--- a/hashmap.c
+++ b/hashmap.c
@@ -174,22 +174,37 @@ void hashmap_init(struct hashmap *map, hashmap_cmp_fn equals_function,
 	map->do_count_items = 1;
 }
 
+static void free_individual_entries(struct hashmap *map, ssize_t entry_offset)
+{
+	struct hashmap_iter iter;
+	struct hashmap_entry *e;
+
+	hashmap_iter_init(map, &iter);
+	while ((e = hashmap_iter_next(&iter)))
+		/*
+		 * like container_of, but using caller-calculated
+		 * offset (caller being hashmap_free_entries)
+		 */
+		free((char *)e - entry_offset);
+}
+
+void hashmap_partial_clear_(struct hashmap *map, ssize_t entry_offset)
+{
+	if (!map || !map->table)
+		return;
+	if (entry_offset >= 0)  /* called by hashmap_clear_entries */
+		free_individual_entries(map, entry_offset);
+	memset(map->table, 0, map->tablesize * sizeof(struct hashmap_entry *));
+	map->shrink_at = 0;
+	map->private_size = 0;
+}
+
 void hashmap_free_(struct hashmap *map, ssize_t entry_offset)
 {
 	if (!map || !map->table)
 		return;
-	if (entry_offset >= 0) { /* called by hashmap_free_entries */
-		struct hashmap_iter iter;
-		struct hashmap_entry *e;
-
-		hashmap_iter_init(map, &iter);
-		while ((e = hashmap_iter_next(&iter)))
-			/*
-			 * like container_of, but using caller-calculated
-			 * offset (caller being hashmap_free_entries)
-			 */
-			free((char *)e - entry_offset);
-	}
+	if (entry_offset >= 0)  /* called by hashmap_free_entries */
+		free_individual_entries(map, entry_offset);
 	free(map->table);
 	memset(map, 0, sizeof(*map));
 }
diff --git a/hashmap.h b/hashmap.h
index 3b0f2bcade..e9430d582a 100644
--- a/hashmap.h
+++ b/hashmap.h
@@ -235,7 +235,8 @@ void hashmap_init(struct hashmap *map,
 		  const void *equals_function_data,
 		  size_t initial_size);
 
-/* internal function for freeing hashmap */
+/* internal functions for clearing or freeing hashmap */
+void hashmap_partial_clear_(struct hashmap *map, ssize_t offset);
 void hashmap_free_(struct hashmap *map, ssize_t offset);
 
 /*
@@ -268,6 +269,16 @@ void hashmap_free_(struct hashmap *map, ssize_t offset);
  */
 #define hashmap_free(map) hashmap_free_(map, -1)
 
+/*
+ * Basically the same as calling hashmap_free() followed by hashmap_init(),
+ * but doesn't incur the overhead of deallocating and reallocating
+ * map->table; it leaves map->table allocated and the same size but zeroes
+ * it out so it's ready for use again as an empty map.  As with
+ * hashmap_free(), you may need to free the entries yourself before calling
+ * this function.
+ */
+#define hashmap_partial_clear(map) hashmap_partial_clear_(map, -1)
+
 /*
  * Frees @map and all entries.  @type is the struct type of the entry
  * where @member is the hashmap_entry struct used to associate with @map.
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v5 05/15] hashmap: provide deallocation function names
  2020-11-06  0:24       ` [PATCH v5 00/15] " Elijah Newren via GitGitGadget
                           ` (3 preceding siblings ...)
  2020-11-06  0:24         ` [PATCH v5 04/15] hashmap: introduce a new hashmap_partial_clear() Elijah Newren via GitGitGadget
@ 2020-11-06  0:24         ` Elijah Newren via GitGitGadget
  2020-11-06  0:24         ` [PATCH v5 06/15] strmap: new utility functions Elijah Newren via GitGitGadget
                           ` (11 subsequent siblings)
  16 siblings, 0 replies; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-06  0:24 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

hashmap_free(), hashmap_free_entries(), and hashmap_free_() have existed
for a while, but aren't necessarily the clearest names, especially with
hashmap_partial_clear() being added to the mix and lazy-initialization
now being supported.  Peff suggested we adopt the following names[1]:

  - hashmap_clear() - remove all entries and de-allocate any
    hashmap-specific data, but be ready for reuse

  - hashmap_clear_and_free() - ditto, but free the entries themselves

  - hashmap_partial_clear() - remove all entries but don't deallocate
    table

  - hashmap_partial_clear_and_free() - ditto, but free the entries

This patch provides the new names and converts all existing callers over
to the new naming scheme.

[1] https://lore.kernel.org/git/20201030125059.GA3277724@coredump.intra.peff.net/

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 add-interactive.c       |  2 +-
 blame.c                 |  2 +-
 bloom.c                 |  2 +-
 builtin/fetch.c         |  6 +++---
 builtin/shortlog.c      |  2 +-
 config.c                |  2 +-
 diff.c                  |  4 ++--
 diffcore-rename.c       |  2 +-
 dir.c                   |  8 ++++----
 hashmap.c               |  6 +++---
 hashmap.h               | 44 +++++++++++++++++++++++++----------------
 merge-recursive.c       |  6 +++---
 name-hash.c             |  4 ++--
 object.c                |  2 +-
 oidmap.c                |  2 +-
 patch-ids.c             |  2 +-
 range-diff.c            |  2 +-
 ref-filter.c            |  2 +-
 revision.c              |  2 +-
 sequencer.c             |  4 ++--
 submodule-config.c      |  4 ++--
 t/helper/test-hashmap.c |  6 +++---
 22 files changed, 63 insertions(+), 53 deletions(-)

diff --git a/add-interactive.c b/add-interactive.c
index 555c4abf32..a14c0feaa2 100644
--- a/add-interactive.c
+++ b/add-interactive.c
@@ -557,7 +557,7 @@ static int get_modified_files(struct repository *r,
 		if (ps)
 			clear_pathspec(&rev.prune_data);
 	}
-	hashmap_free_entries(&s.file_map, struct pathname_entry, ent);
+	hashmap_clear_and_free(&s.file_map, struct pathname_entry, ent);
 	if (unmerged_count)
 		*unmerged_count = s.unmerged_count;
 	if (binary_count)
diff --git a/blame.c b/blame.c
index 686845b2b4..229beb6452 100644
--- a/blame.c
+++ b/blame.c
@@ -435,7 +435,7 @@ static void get_fingerprint(struct fingerprint *result,
 
 static void free_fingerprint(struct fingerprint *f)
 {
-	hashmap_free(&f->map);
+	hashmap_clear(&f->map);
 	free(f->entries);
 }
 
diff --git a/bloom.c b/bloom.c
index 68c73200a5..719c313a1c 100644
--- a/bloom.c
+++ b/bloom.c
@@ -287,7 +287,7 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r,
 		}
 
 	cleanup:
-		hashmap_free_entries(&pathmap, struct pathmap_hash_entry, entry);
+		hashmap_clear_and_free(&pathmap, struct pathmap_hash_entry, entry);
 	} else {
 		for (i = 0; i < diff_queued_diff.nr; i++)
 			diff_free_filepair(diff_queued_diff.queue[i]);
diff --git a/builtin/fetch.c b/builtin/fetch.c
index f9c3c49f14..ecf8537605 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -393,7 +393,7 @@ static void find_non_local_tags(const struct ref *refs,
 		item = refname_hash_add(&remote_refs, ref->name, &ref->old_oid);
 		string_list_insert(&remote_refs_list, ref->name);
 	}
-	hashmap_free_entries(&existing_refs, struct refname_hash_entry, ent);
+	hashmap_clear_and_free(&existing_refs, struct refname_hash_entry, ent);
 
 	/*
 	 * We may have a final lightweight tag that needs to be
@@ -428,7 +428,7 @@ static void find_non_local_tags(const struct ref *refs,
 		**tail = rm;
 		*tail = &rm->next;
 	}
-	hashmap_free_entries(&remote_refs, struct refname_hash_entry, ent);
+	hashmap_clear_and_free(&remote_refs, struct refname_hash_entry, ent);
 	string_list_clear(&remote_refs_list, 0);
 	oidset_clear(&fetch_oids);
 }
@@ -573,7 +573,7 @@ static struct ref *get_ref_map(struct remote *remote,
 		}
 	}
 	if (existing_refs_populated)
-		hashmap_free_entries(&existing_refs, struct refname_hash_entry, ent);
+		hashmap_clear_and_free(&existing_refs, struct refname_hash_entry, ent);
 
 	return ref_map;
 }
diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index 0a5c4968f6..83f0a739b4 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -220,7 +220,7 @@ static void strset_clear(struct strset *ss)
 {
 	if (!ss->map.table)
 		return;
-	hashmap_free_entries(&ss->map, struct strset_item, ent);
+	hashmap_clear_and_free(&ss->map, struct strset_item, ent);
 }
 
 static void insert_records_from_trailers(struct shortlog *log,
diff --git a/config.c b/config.c
index 2bdff4457b..8f324ed3a6 100644
--- a/config.c
+++ b/config.c
@@ -1963,7 +1963,7 @@ void git_configset_clear(struct config_set *cs)
 		free(entry->key);
 		string_list_clear(&entry->value_list, 1);
 	}
-	hashmap_free_entries(&cs->config_hash, struct config_set_element, ent);
+	hashmap_clear_and_free(&cs->config_hash, struct config_set_element, ent);
 	cs->hash_initialized = 0;
 	free(cs->list.items);
 	cs->list.nr = 0;
diff --git a/diff.c b/diff.c
index 2bb2f8f57e..8e0e59f5cf 100644
--- a/diff.c
+++ b/diff.c
@@ -6289,9 +6289,9 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
 			if (o->color_moved == COLOR_MOVED_ZEBRA_DIM)
 				dim_moved_lines(o);
 
-			hashmap_free_entries(&add_lines, struct moved_entry,
+			hashmap_clear_and_free(&add_lines, struct moved_entry,
 						ent);
-			hashmap_free_entries(&del_lines, struct moved_entry,
+			hashmap_clear_and_free(&del_lines, struct moved_entry,
 						ent);
 		}
 
diff --git a/diffcore-rename.c b/diffcore-rename.c
index 99e63e90f8..d367a6d244 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -407,7 +407,7 @@ static int find_exact_renames(struct diff_options *options)
 		renames += find_identical_files(&file_table, i, options);
 
 	/* Free the hash data structure and entries */
-	hashmap_free_entries(&file_table, struct file_similarity, entry);
+	hashmap_clear_and_free(&file_table, struct file_similarity, entry);
 
 	return renames;
 }
diff --git a/dir.c b/dir.c
index 78387110e6..161dce121e 100644
--- a/dir.c
+++ b/dir.c
@@ -817,8 +817,8 @@ static void add_pattern_to_hashsets(struct pattern_list *pl, struct path_pattern
 
 clear_hashmaps:
 	warning(_("disabling cone pattern matching"));
-	hashmap_free_entries(&pl->parent_hashmap, struct pattern_entry, ent);
-	hashmap_free_entries(&pl->recursive_hashmap, struct pattern_entry, ent);
+	hashmap_clear_and_free(&pl->parent_hashmap, struct pattern_entry, ent);
+	hashmap_clear_and_free(&pl->recursive_hashmap, struct pattern_entry, ent);
 	pl->use_cone_patterns = 0;
 }
 
@@ -921,8 +921,8 @@ void clear_pattern_list(struct pattern_list *pl)
 		free(pl->patterns[i]);
 	free(pl->patterns);
 	free(pl->filebuf);
-	hashmap_free_entries(&pl->recursive_hashmap, struct pattern_entry, ent);
-	hashmap_free_entries(&pl->parent_hashmap, struct pattern_entry, ent);
+	hashmap_clear_and_free(&pl->recursive_hashmap, struct pattern_entry, ent);
+	hashmap_clear_and_free(&pl->parent_hashmap, struct pattern_entry, ent);
 
 	memset(pl, 0, sizeof(*pl));
 }
diff --git a/hashmap.c b/hashmap.c
index 922ed07954..5009471800 100644
--- a/hashmap.c
+++ b/hashmap.c
@@ -183,7 +183,7 @@ static void free_individual_entries(struct hashmap *map, ssize_t entry_offset)
 	while ((e = hashmap_iter_next(&iter)))
 		/*
 		 * like container_of, but using caller-calculated
-		 * offset (caller being hashmap_free_entries)
+		 * offset (caller being hashmap_clear_and_free)
 		 */
 		free((char *)e - entry_offset);
 }
@@ -199,11 +199,11 @@ void hashmap_partial_clear_(struct hashmap *map, ssize_t entry_offset)
 	map->private_size = 0;
 }
 
-void hashmap_free_(struct hashmap *map, ssize_t entry_offset)
+void hashmap_clear_(struct hashmap *map, ssize_t entry_offset)
 {
 	if (!map || !map->table)
 		return;
-	if (entry_offset >= 0)  /* called by hashmap_free_entries */
+	if (entry_offset >= 0)  /* called by hashmap_clear_and_free */
 		free_individual_entries(map, entry_offset);
 	free(map->table);
 	memset(map, 0, sizeof(*map));
diff --git a/hashmap.h b/hashmap.h
index e9430d582a..7251687d73 100644
--- a/hashmap.h
+++ b/hashmap.h
@@ -96,7 +96,7 @@
  *         }
  *
  *         if (!strcmp("end", action)) {
- *             hashmap_free_entries(&map, struct long2string, ent);
+ *             hashmap_clear_and_free(&map, struct long2string, ent);
  *             break;
  *         }
  *     }
@@ -237,7 +237,7 @@ void hashmap_init(struct hashmap *map,
 
 /* internal functions for clearing or freeing hashmap */
 void hashmap_partial_clear_(struct hashmap *map, ssize_t offset);
-void hashmap_free_(struct hashmap *map, ssize_t offset);
+void hashmap_clear_(struct hashmap *map, ssize_t offset);
 
 /*
  * Frees a hashmap structure and allocated memory for the table, but does not
@@ -253,40 +253,50 @@ void hashmap_free_(struct hashmap *map, ssize_t offset);
  *      free(e->somefield);
  *      free(e);
  *    }
- *    hashmap_free(map);
+ *    hashmap_clear(map);
  *
  * instead of
  *
  *    hashmap_for_each_entry(map, hashmap_iter, e, hashmap_entry_name) {
  *      free(e->somefield);
  *    }
- *    hashmap_free_entries(map, struct my_entry_struct, hashmap_entry_name);
+ *    hashmap_clear_and_free(map, struct my_entry_struct, hashmap_entry_name);
  *
  * to avoid the implicit extra loop over the entries.  However, if there are
  * no special fields in your entry that need to be freed beyond the entry
  * itself, it is probably simpler to avoid the explicit loop and just call
- * hashmap_free_entries().
+ * hashmap_clear_and_free().
  */
-#define hashmap_free(map) hashmap_free_(map, -1)
+#define hashmap_clear(map) hashmap_clear_(map, -1)
 
 /*
- * Basically the same as calling hashmap_free() followed by hashmap_init(),
- * but doesn't incur the overhead of deallocating and reallocating
- * map->table; it leaves map->table allocated and the same size but zeroes
- * it out so it's ready for use again as an empty map.  As with
- * hashmap_free(), you may need to free the entries yourself before calling
- * this function.
+ * Similar to hashmap_clear(), except that the table is no deallocated; it
+ * is merely zeroed out but left the same size as before.  If the hashmap
+ * will be reused, this avoids the overhead of deallocating and
+ * reallocating map->table.  As with hashmap_clear(), you may need to free
+ * the entries yourself before calling this function.
  */
 #define hashmap_partial_clear(map) hashmap_partial_clear_(map, -1)
 
 /*
- * Frees @map and all entries.  @type is the struct type of the entry
- * where @member is the hashmap_entry struct used to associate with @map.
+ * Similar to hashmap_clear() but also frees all entries.  @type is the
+ * struct type of the entry where @member is the hashmap_entry struct used
+ * to associate with @map.
  *
- * See usage note above hashmap_free().
+ * See usage note above hashmap_clear().
  */
-#define hashmap_free_entries(map, type, member) \
-	hashmap_free_(map, offsetof(type, member));
+#define hashmap_clear_and_free(map, type, member) \
+	hashmap_clear_(map, offsetof(type, member))
+
+/*
+ * Similar to hashmap_partial_clear() but also frees all entries.  @type is
+ * the struct type of the entry where @member is the hashmap_entry struct
+ * used to associate with @map.
+ *
+ * See usage note above hashmap_clear().
+ */
+#define hashmap_partial_clear_and_free(map, type, member) \
+	hashmap_partial_clear_(map, offsetof(type, member))
 
 /* hashmap_entry functions */
 
diff --git a/merge-recursive.c b/merge-recursive.c
index d0214335a7..f736a0f632 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -2651,7 +2651,7 @@ static struct string_list *get_renames(struct merge_options *opt,
 		free(e->target_file);
 		string_list_clear(&e->source_files, 0);
 	}
-	hashmap_free_entries(&collisions, struct collision_entry, ent);
+	hashmap_clear_and_free(&collisions, struct collision_entry, ent);
 	return renames;
 }
 
@@ -2870,7 +2870,7 @@ static void initial_cleanup_rename(struct diff_queue_struct *pairs,
 		strbuf_release(&e->new_dir);
 		/* possible_new_dirs already cleared in get_directory_renames */
 	}
-	hashmap_free_entries(dir_renames, struct dir_rename_entry, ent);
+	hashmap_clear_and_free(dir_renames, struct dir_rename_entry, ent);
 	free(dir_renames);
 
 	free(pairs->queue);
@@ -3497,7 +3497,7 @@ static int merge_trees_internal(struct merge_options *opt,
 		string_list_clear(entries, 1);
 		free(entries);
 
-		hashmap_free_entries(&opt->priv->current_file_dir_set,
+		hashmap_clear_and_free(&opt->priv->current_file_dir_set,
 					struct path_hashmap_entry, e);
 
 		if (clean < 0) {
diff --git a/name-hash.c b/name-hash.c
index fb526a3775..5d3c7b12c1 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -726,6 +726,6 @@ void free_name_hash(struct index_state *istate)
 		return;
 	istate->name_hash_initialized = 0;
 
-	hashmap_free(&istate->name_hash);
-	hashmap_free_entries(&istate->dir_hash, struct dir_entry, ent);
+	hashmap_clear(&istate->name_hash);
+	hashmap_clear_and_free(&istate->dir_hash, struct dir_entry, ent);
 }
diff --git a/object.c b/object.c
index 3257518656..b8406409d5 100644
--- a/object.c
+++ b/object.c
@@ -532,7 +532,7 @@ void raw_object_store_clear(struct raw_object_store *o)
 	close_object_store(o);
 	o->packed_git = NULL;
 
-	hashmap_free(&o->pack_map);
+	hashmap_clear(&o->pack_map);
 }
 
 void parsed_object_pool_clear(struct parsed_object_pool *o)
diff --git a/oidmap.c b/oidmap.c
index 423aa014a3..286a04a53c 100644
--- a/oidmap.c
+++ b/oidmap.c
@@ -27,7 +27,7 @@ void oidmap_free(struct oidmap *map, int free_entries)
 		return;
 
 	/* TODO: make oidmap itself not depend on struct layouts */
-	hashmap_free_(&map->map, free_entries ? 0 : -1);
+	hashmap_clear_(&map->map, free_entries ? 0 : -1);
 }
 
 void *oidmap_get(const struct oidmap *map, const struct object_id *key)
diff --git a/patch-ids.c b/patch-ids.c
index 12aa6d494b..21973e4933 100644
--- a/patch-ids.c
+++ b/patch-ids.c
@@ -71,7 +71,7 @@ int init_patch_ids(struct repository *r, struct patch_ids *ids)
 
 int free_patch_ids(struct patch_ids *ids)
 {
-	hashmap_free_entries(&ids->patches, struct patch_id, ent);
+	hashmap_clear_and_free(&ids->patches, struct patch_id, ent);
 	return 0;
 }
 
diff --git a/range-diff.c b/range-diff.c
index 24dc435e48..befeecae44 100644
--- a/range-diff.c
+++ b/range-diff.c
@@ -266,7 +266,7 @@ static void find_exact_matches(struct string_list *a, struct string_list *b)
 		}
 	}
 
-	hashmap_free(&map);
+	hashmap_clear(&map);
 }
 
 static void diffsize_consume(void *data, char *line, unsigned long len)
diff --git a/ref-filter.c b/ref-filter.c
index c62f6b4822..5e66b8cd76 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2222,7 +2222,7 @@ void ref_array_clear(struct ref_array *array)
 	used_atom_cnt = 0;
 
 	if (ref_to_worktree_map.worktrees) {
-		hashmap_free_entries(&(ref_to_worktree_map.map),
+		hashmap_clear_and_free(&(ref_to_worktree_map.map),
 					struct ref_to_worktree_entry, ent);
 		free_worktrees(ref_to_worktree_map.worktrees);
 		ref_to_worktree_map.worktrees = NULL;
diff --git a/revision.c b/revision.c
index aa62212040..f27649d45d 100644
--- a/revision.c
+++ b/revision.c
@@ -139,7 +139,7 @@ static void paths_and_oids_clear(struct hashmap *map)
 		free(entry->path);
 	}
 
-	hashmap_free_entries(map, struct path_and_oids_entry, ent);
+	hashmap_clear_and_free(map, struct path_and_oids_entry, ent);
 }
 
 static void paths_and_oids_insert(struct hashmap *map,
diff --git a/sequencer.c b/sequencer.c
index 00acb12496..23a09c3e7a 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -5058,7 +5058,7 @@ static int make_script_with_merges(struct pretty_print_context *pp,
 
 	oidmap_free(&commit2todo, 1);
 	oidmap_free(&state.commit2label, 1);
-	hashmap_free_entries(&state.labels, struct labels_entry, entry);
+	hashmap_clear_and_free(&state.labels, struct labels_entry, entry);
 	strbuf_release(&state.buf);
 
 	return 0;
@@ -5577,7 +5577,7 @@ int todo_list_rearrange_squash(struct todo_list *todo_list)
 	for (i = 0; i < todo_list->nr; i++)
 		free(subjects[i]);
 	free(subjects);
-	hashmap_free_entries(&subject2item, struct subject2item_entry, entry);
+	hashmap_clear_and_free(&subject2item, struct subject2item_entry, entry);
 
 	clear_commit_todo_item(&commit_todo);
 
diff --git a/submodule-config.c b/submodule-config.c
index c569e22aa3..f502505566 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -103,8 +103,8 @@ static void submodule_cache_clear(struct submodule_cache *cache)
 				ent /* member name */)
 		free_one_config(entry);
 
-	hashmap_free_entries(&cache->for_path, struct submodule_entry, ent);
-	hashmap_free_entries(&cache->for_name, struct submodule_entry, ent);
+	hashmap_clear_and_free(&cache->for_path, struct submodule_entry, ent);
+	hashmap_clear_and_free(&cache->for_name, struct submodule_entry, ent);
 	cache->initialized = 0;
 	cache->gitmodules_read = 0;
 }
diff --git a/t/helper/test-hashmap.c b/t/helper/test-hashmap.c
index f38706216f..2475663b49 100644
--- a/t/helper/test-hashmap.c
+++ b/t/helper/test-hashmap.c
@@ -110,7 +110,7 @@ static void perf_hashmap(unsigned int method, unsigned int rounds)
 				hashmap_add(&map, &entries[i]->ent);
 			}
 
-			hashmap_free(&map);
+			hashmap_clear(&map);
 		}
 	} else {
 		/* test map lookups */
@@ -130,7 +130,7 @@ static void perf_hashmap(unsigned int method, unsigned int rounds)
 			}
 		}
 
-		hashmap_free(&map);
+		hashmap_clear(&map);
 	}
 }
 
@@ -262,6 +262,6 @@ int cmd__hashmap(int argc, const char **argv)
 	}
 
 	strbuf_release(&line);
-	hashmap_free_entries(&map, struct test_entry, ent);
+	hashmap_clear_and_free(&map, struct test_entry, ent);
 	return 0;
 }
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v5 06/15] strmap: new utility functions
  2020-11-06  0:24       ` [PATCH v5 00/15] " Elijah Newren via GitGitGadget
                           ` (4 preceding siblings ...)
  2020-11-06  0:24         ` [PATCH v5 05/15] hashmap: provide deallocation function names Elijah Newren via GitGitGadget
@ 2020-11-06  0:24         ` Elijah Newren via GitGitGadget
  2020-11-06  0:24         ` [PATCH v5 07/15] strmap: add more " Elijah Newren via GitGitGadget
                           ` (10 subsequent siblings)
  16 siblings, 0 replies; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-06  0:24 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Add strmap as a new struct and associated utility functions,
specifically for hashmaps that map strings to some value.  The API is
taken directly from Peff's proposal at
https://lore.kernel.org/git/20180906191203.GA26184@sigill.intra.peff.net/

Note that similar string-list, I have a strdup_strings setting.
However, unlike string-list, strmap_init() does not take a parameter for
this setting and instead automatically sets it to 1; callers who want to
control this detail need to instead call strmap_init_with_options().
(Future patches will add additional parameters to
strmap_init_with_options()).

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 Makefile |  1 +
 strmap.c | 99 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 strmap.h | 65 +++++++++++++++++++++++++++++++++++++
 3 files changed, 165 insertions(+)
 create mode 100644 strmap.c
 create mode 100644 strmap.h

diff --git a/Makefile b/Makefile
index 95571ee3fc..777a34c01c 100644
--- a/Makefile
+++ b/Makefile
@@ -1000,6 +1000,7 @@ LIB_OBJS += stable-qsort.o
 LIB_OBJS += strbuf.o
 LIB_OBJS += streaming.o
 LIB_OBJS += string-list.o
+LIB_OBJS += strmap.o
 LIB_OBJS += strvec.o
 LIB_OBJS += sub-process.o
 LIB_OBJS += submodule-config.o
diff --git a/strmap.c b/strmap.c
new file mode 100644
index 0000000000..53f284eb20
--- /dev/null
+++ b/strmap.c
@@ -0,0 +1,99 @@
+#include "git-compat-util.h"
+#include "strmap.h"
+
+int cmp_strmap_entry(const void *hashmap_cmp_fn_data,
+		     const struct hashmap_entry *entry1,
+		     const struct hashmap_entry *entry2,
+		     const void *keydata)
+{
+	const struct strmap_entry *e1, *e2;
+
+	e1 = container_of(entry1, const struct strmap_entry, ent);
+	e2 = container_of(entry2, const struct strmap_entry, ent);
+	return strcmp(e1->key, e2->key);
+}
+
+static struct strmap_entry *find_strmap_entry(struct strmap *map,
+					      const char *str)
+{
+	struct strmap_entry entry;
+	hashmap_entry_init(&entry.ent, strhash(str));
+	entry.key = str;
+	return hashmap_get_entry(&map->map, &entry, ent, NULL);
+}
+
+void strmap_init(struct strmap *map)
+{
+	strmap_init_with_options(map, 1);
+}
+
+void strmap_init_with_options(struct strmap *map,
+			      int strdup_strings)
+{
+	hashmap_init(&map->map, cmp_strmap_entry, NULL, 0);
+	map->strdup_strings = strdup_strings;
+}
+
+static void strmap_free_entries_(struct strmap *map, int free_values)
+{
+	struct hashmap_iter iter;
+	struct strmap_entry *e;
+
+	if (!map)
+		return;
+
+	/*
+	 * We need to iterate over the hashmap entries and free
+	 * e->key and e->value ourselves; hashmap has no API to
+	 * take care of that for us.  Since we're already iterating over
+	 * the hashmap, though, might as well free e too and avoid the need
+	 * to make some call into the hashmap API to do that.
+	 */
+	hashmap_for_each_entry(&map->map, &iter, e, ent) {
+		if (free_values)
+			free(e->value);
+		if (map->strdup_strings)
+			free((char*)e->key);
+		free(e);
+	}
+}
+
+void strmap_clear(struct strmap *map, int free_values)
+{
+	strmap_free_entries_(map, free_values);
+	hashmap_clear(&map->map);
+}
+
+void *strmap_put(struct strmap *map, const char *str, void *data)
+{
+	struct strmap_entry *entry = find_strmap_entry(map, str);
+	void *old = NULL;
+
+	if (entry) {
+		old = entry->value;
+		entry->value = data;
+	} else {
+		const char *key = str;
+
+		entry = xmalloc(sizeof(*entry));
+		hashmap_entry_init(&entry->ent, strhash(str));
+
+		if (map->strdup_strings)
+			key = xstrdup(str);
+		entry->key = key;
+		entry->value = data;
+		hashmap_add(&map->map, &entry->ent);
+	}
+	return old;
+}
+
+void *strmap_get(struct strmap *map, const char *str)
+{
+	struct strmap_entry *entry = find_strmap_entry(map, str);
+	return entry ? entry->value : NULL;
+}
+
+int strmap_contains(struct strmap *map, const char *str)
+{
+	return find_strmap_entry(map, str) != NULL;
+}
diff --git a/strmap.h b/strmap.h
new file mode 100644
index 0000000000..96888c23ad
--- /dev/null
+++ b/strmap.h
@@ -0,0 +1,65 @@
+#ifndef STRMAP_H
+#define STRMAP_H
+
+#include "hashmap.h"
+
+struct strmap {
+	struct hashmap map;
+	unsigned int strdup_strings:1;
+};
+
+struct strmap_entry {
+	struct hashmap_entry ent;
+	const char *key;
+	void *value;
+};
+
+int cmp_strmap_entry(const void *hashmap_cmp_fn_data,
+		     const struct hashmap_entry *entry1,
+		     const struct hashmap_entry *entry2,
+		     const void *keydata);
+
+#define STRMAP_INIT { \
+			.map = HASHMAP_INIT(cmp_strmap_entry, NULL),  \
+			.strdup_strings = 1,                          \
+		    }
+
+/*
+ * Initialize the members of the strmap.  Any keys added to the strmap will
+ * be strdup'ed with their memory managed by the strmap.
+ */
+void strmap_init(struct strmap *map);
+
+/*
+ * Same as strmap_init, but for those who want to control the memory management
+ * carefully instead of using the default of strdup_strings=1.
+ */
+void strmap_init_with_options(struct strmap *map,
+			      int strdup_strings);
+
+/*
+ * Remove all entries from the map, releasing any allocated resources.
+ */
+void strmap_clear(struct strmap *map, int free_values);
+
+/*
+ * Insert "str" into the map, pointing to "data".
+ *
+ * If an entry for "str" already exists, its data pointer is overwritten, and
+ * the original data pointer returned. Otherwise, returns NULL.
+ */
+void *strmap_put(struct strmap *map, const char *str, void *data);
+
+/*
+ * Return the data pointer mapped by "str", or NULL if the entry does not
+ * exist.
+ */
+void *strmap_get(struct strmap *map, const char *str);
+
+/*
+ * Return non-zero iff "str" is present in the map. This differs from
+ * strmap_get() in that it can distinguish entries with a NULL data pointer.
+ */
+int strmap_contains(struct strmap *map, const char *str);
+
+#endif /* STRMAP_H */
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v5 07/15] strmap: add more utility functions
  2020-11-06  0:24       ` [PATCH v5 00/15] " Elijah Newren via GitGitGadget
                           ` (5 preceding siblings ...)
  2020-11-06  0:24         ` [PATCH v5 06/15] strmap: new utility functions Elijah Newren via GitGitGadget
@ 2020-11-06  0:24         ` Elijah Newren via GitGitGadget
  2020-11-06  0:24         ` [PATCH v5 08/15] strmap: enable faster clearing and reusing of strmaps Elijah Newren via GitGitGadget
                           ` (9 subsequent siblings)
  16 siblings, 0 replies; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-06  0:24 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

This adds a number of additional convienence functions I want/need:
  * strmap_get_size()
  * strmap_empty()
  * strmap_remove()
  * strmap_for_each_entry()
  * strmap_get_entry()

I suspect the first four are self-explanatory.

strmap_get_entry() is similar to strmap_get() except that instead of just
returning the void* value that the string maps to, it returns the
strmap_entry that contains both the string and the void* value (or
NULL if the string isn't in the map).  This is helpful because it avoids
multiple lookups, e.g. in some cases a caller would need to call:
  * strmap_contains() to check that the map has an entry for the string
  * strmap_get() to get the void* value
  * <do some work to update the value>
  * strmap_put() to update/overwrite the value
If the void* pointer returned really is a pointer, then the last step is
unnecessary, but if the void* pointer is just cast to an integer then
strmap_put() will be needed.  In contrast, one can call strmap_get_entry()
and then:
  * check if the string was in the map by whether the pointer is NULL
  * access the value via entry->value
  * directly update entry->value
meaning that we can replace two or three hash table lookups with one.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 strmap.c | 20 ++++++++++++++++++++
 strmap.h | 34 ++++++++++++++++++++++++++++++++++
 2 files changed, 54 insertions(+)

diff --git a/strmap.c b/strmap.c
index 53f284eb20..829f1bc095 100644
--- a/strmap.c
+++ b/strmap.c
@@ -87,6 +87,11 @@ void *strmap_put(struct strmap *map, const char *str, void *data)
 	return old;
 }
 
+struct strmap_entry *strmap_get_entry(struct strmap *map, const char *str)
+{
+	return find_strmap_entry(map, str);
+}
+
 void *strmap_get(struct strmap *map, const char *str)
 {
 	struct strmap_entry *entry = find_strmap_entry(map, str);
@@ -97,3 +102,18 @@ int strmap_contains(struct strmap *map, const char *str)
 {
 	return find_strmap_entry(map, str) != NULL;
 }
+
+void strmap_remove(struct strmap *map, const char *str, int free_value)
+{
+	struct strmap_entry entry, *ret;
+	hashmap_entry_init(&entry.ent, strhash(str));
+	entry.key = str;
+	ret = hashmap_remove_entry(&map->map, &entry, ent, NULL);
+	if (!ret)
+		return;
+	if (free_value)
+		free(ret->value);
+	if (map->strdup_strings)
+		free((char*)ret->key);
+	free(ret);
+}
diff --git a/strmap.h b/strmap.h
index 96888c23ad..f74bc582e4 100644
--- a/strmap.h
+++ b/strmap.h
@@ -50,6 +50,12 @@ void strmap_clear(struct strmap *map, int free_values);
  */
 void *strmap_put(struct strmap *map, const char *str, void *data);
 
+/*
+ * Return the strmap_entry mapped by "str", or NULL if there is not such
+ * an item in map.
+ */
+struct strmap_entry *strmap_get_entry(struct strmap *map, const char *str);
+
 /*
  * Return the data pointer mapped by "str", or NULL if the entry does not
  * exist.
@@ -62,4 +68,32 @@ void *strmap_get(struct strmap *map, const char *str);
  */
 int strmap_contains(struct strmap *map, const char *str);
 
+/*
+ * Remove the given entry from the strmap.  If the string isn't in the
+ * strmap, the map is not altered.
+ */
+void strmap_remove(struct strmap *map, const char *str, int free_value);
+
+/*
+ * Return how many entries the strmap has.
+ */
+static inline unsigned int strmap_get_size(struct strmap *map)
+{
+	return hashmap_get_size(&map->map);
+}
+
+/*
+ * Return whether the strmap is empty.
+ */
+static inline int strmap_empty(struct strmap *map)
+{
+	return strmap_get_size(map) == 0;
+}
+
+/*
+ * iterate through @map using @iter, @var is a pointer to a type strmap_entry
+ */
+#define strmap_for_each_entry(mystrmap, iter, var)	\
+	hashmap_for_each_entry(&(mystrmap)->map, iter, var, ent)
+
 #endif /* STRMAP_H */
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v5 08/15] strmap: enable faster clearing and reusing of strmaps
  2020-11-06  0:24       ` [PATCH v5 00/15] " Elijah Newren via GitGitGadget
                           ` (6 preceding siblings ...)
  2020-11-06  0:24         ` [PATCH v5 07/15] strmap: add more " Elijah Newren via GitGitGadget
@ 2020-11-06  0:24         ` Elijah Newren via GitGitGadget
  2020-11-06  0:24         ` [PATCH v5 09/15] strmap: add functions facilitating use as a string->int map Elijah Newren via GitGitGadget
                           ` (8 subsequent siblings)
  16 siblings, 0 replies; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-06  0:24 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

When strmaps are used heavily, such as is done by my new merge-ort
algorithm, and strmaps need to be cleared but then re-used (because of
e.g. picking multiple commits to cherry-pick, or due to a recursive
merge having several different merges while recursing), free-ing and
reallocating map->table repeatedly can add up in time, especially since
it will likely be reallocated to a much smaller size but the previous
merge provides a good guide to the right size to use for the next merge.

Introduce strmap_partial_clear() to take advantage of this type of
situation; it will act similar to strmap_clear() except that
map->table's entries are zeroed instead of map->table being free'd.
Making use of this function reduced the cost of
clear_or_reinit_internal_opts() by about 20% in mert-ort, and dropped
the overall runtime of my rebase testcase by just under 2%.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 strmap.c | 6 ++++++
 strmap.h | 6 ++++++
 2 files changed, 12 insertions(+)

diff --git a/strmap.c b/strmap.c
index 829f1bc095..c410c5241a 100644
--- a/strmap.c
+++ b/strmap.c
@@ -64,6 +64,12 @@ void strmap_clear(struct strmap *map, int free_values)
 	hashmap_clear(&map->map);
 }
 
+void strmap_partial_clear(struct strmap *map, int free_values)
+{
+	strmap_free_entries_(map, free_values);
+	hashmap_partial_clear(&map->map);
+}
+
 void *strmap_put(struct strmap *map, const char *str, void *data)
 {
 	struct strmap_entry *entry = find_strmap_entry(map, str);
diff --git a/strmap.h b/strmap.h
index f74bc582e4..c14fcee148 100644
--- a/strmap.h
+++ b/strmap.h
@@ -42,6 +42,12 @@ void strmap_init_with_options(struct strmap *map,
  */
 void strmap_clear(struct strmap *map, int free_values);
 
+/*
+ * Similar to strmap_clear() but leaves map->map->table allocated and
+ * pre-sized so that subsequent uses won't need as many rehashings.
+ */
+void strmap_partial_clear(struct strmap *map, int free_values);
+
 /*
  * Insert "str" into the map, pointing to "data".
  *
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v5 09/15] strmap: add functions facilitating use as a string->int map
  2020-11-06  0:24       ` [PATCH v5 00/15] " Elijah Newren via GitGitGadget
                           ` (7 preceding siblings ...)
  2020-11-06  0:24         ` [PATCH v5 08/15] strmap: enable faster clearing and reusing of strmaps Elijah Newren via GitGitGadget
@ 2020-11-06  0:24         ` Elijah Newren via GitGitGadget
  2020-11-06  0:24         ` [PATCH v5 10/15] strmap: split create_entry() out of strmap_put() Elijah Newren via GitGitGadget
                           ` (7 subsequent siblings)
  16 siblings, 0 replies; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-06  0:24 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Although strmap could be used as a string->int map, one either had to
allocate an int for every entry and then deallocate later, or one had to
do a bunch of casting between (void*) and (intptr_t).

Add some special functions that do the casting.  Also, rename put->set
for such wrapper functions since 'put' implied there may be some
deallocation needed if the string was already found in the map, which
isn't the case when we're storing an int value directly in the void*
slot instead of using the void* slot as a pointer to data.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 strmap.c | 11 +++++++
 strmap.h | 94 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 105 insertions(+)

diff --git a/strmap.c b/strmap.c
index c410c5241a..0d10a884b5 100644
--- a/strmap.c
+++ b/strmap.c
@@ -123,3 +123,14 @@ void strmap_remove(struct strmap *map, const char *str, int free_value)
 		free((char*)ret->key);
 	free(ret);
 }
+
+void strintmap_incr(struct strintmap *map, const char *str, intptr_t amt)
+{
+	struct strmap_entry *entry = find_strmap_entry(&map->map, str);
+	if (entry) {
+		intptr_t *whence = (intptr_t*)&entry->value;
+		*whence += amt;
+	}
+	else
+		strintmap_set(map, str, map->default_value + amt);
+}
diff --git a/strmap.h b/strmap.h
index c14fcee148..56a5cdb864 100644
--- a/strmap.h
+++ b/strmap.h
@@ -23,6 +23,10 @@ int cmp_strmap_entry(const void *hashmap_cmp_fn_data,
 			.map = HASHMAP_INIT(cmp_strmap_entry, NULL),  \
 			.strdup_strings = 1,                          \
 		    }
+#define STRINTMAP_INIT { \
+			.map = STRMAP_INIT,   \
+			.default_value = 0,   \
+		       }
 
 /*
  * Initialize the members of the strmap.  Any keys added to the strmap will
@@ -102,4 +106,94 @@ static inline int strmap_empty(struct strmap *map)
 #define strmap_for_each_entry(mystrmap, iter, var)	\
 	hashmap_for_each_entry(&(mystrmap)->map, iter, var, ent)
 
+
+/*
+ * strintmap:
+ *    A map of string -> int, typecasting the void* of strmap to an int.
+ *
+ * Primary differences:
+ *    1) Since the void* value is just an int in disguise, there is no value
+ *       to free.  (Thus one fewer argument to strintmap_clear)
+ *    2) strintmap_get() returns an int, or returns the default_value if the
+ *       key is not found in the strintmap.
+ *    3) No strmap_put() equivalent; strintmap_set() and strintmap_incr()
+ *       instead.
+ */
+
+struct strintmap {
+	struct strmap map;
+	int default_value;
+};
+
+#define strintmap_for_each_entry(mystrmap, iter, var)	\
+	strmap_for_each_entry(&(mystrmap)->map, iter, var)
+
+static inline void strintmap_init(struct strintmap *map, int default_value)
+{
+	strmap_init(&map->map);
+	map->default_value = default_value;
+}
+
+static inline void strintmap_init_with_options(struct strintmap *map,
+					       int default_value,
+					       int strdup_strings)
+{
+	strmap_init_with_options(&map->map, strdup_strings);
+	map->default_value = default_value;
+}
+
+static inline void strintmap_clear(struct strintmap *map)
+{
+	strmap_clear(&map->map, 0);
+}
+
+static inline void strintmap_partial_clear(struct strintmap *map)
+{
+	strmap_partial_clear(&map->map, 0);
+}
+
+static inline int strintmap_contains(struct strintmap *map, const char *str)
+{
+	return strmap_contains(&map->map, str);
+}
+
+static inline void strintmap_remove(struct strintmap *map, const char *str)
+{
+	return strmap_remove(&map->map, str, 0);
+}
+
+static inline int strintmap_empty(struct strintmap *map)
+{
+	return strmap_empty(&map->map);
+}
+
+static inline unsigned int strintmap_get_size(struct strintmap *map)
+{
+	return strmap_get_size(&map->map);
+}
+
+/*
+ * Returns the value for str in the map.  If str isn't found in the map,
+ * the map's default_value is returned.
+ */
+static inline int strintmap_get(struct strintmap *map, const char *str)
+{
+	struct strmap_entry *result = strmap_get_entry(&map->map, str);
+	if (!result)
+		return map->default_value;
+	return (intptr_t)result->value;
+}
+
+static inline void strintmap_set(struct strintmap *map, const char *str,
+				 intptr_t v)
+{
+	strmap_put(&map->map, str, (void *)v);
+}
+
+/*
+ * Increment the value for str by amt.  If str isn't in the map, add it and
+ * set its value to default_value + amt.
+ */
+void strintmap_incr(struct strintmap *map, const char *str, intptr_t amt);
+
 #endif /* STRMAP_H */
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v5 10/15] strmap: split create_entry() out of strmap_put()
  2020-11-06  0:24       ` [PATCH v5 00/15] " Elijah Newren via GitGitGadget
                           ` (8 preceding siblings ...)
  2020-11-06  0:24         ` [PATCH v5 09/15] strmap: add functions facilitating use as a string->int map Elijah Newren via GitGitGadget
@ 2020-11-06  0:24         ` Elijah Newren via GitGitGadget
  2020-11-06  0:24         ` [PATCH v5 11/15] strmap: add a strset sub-type Elijah Newren via GitGitGadget
                           ` (6 subsequent siblings)
  16 siblings, 0 replies; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-06  0:24 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

This will facilitate adding entries to a strmap subtype in ways that
differ slightly from that of strmap_put().

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 strmap.c | 37 +++++++++++++++++++++++--------------
 1 file changed, 23 insertions(+), 14 deletions(-)

diff --git a/strmap.c b/strmap.c
index 0d10a884b5..dc84c57c07 100644
--- a/strmap.c
+++ b/strmap.c
@@ -70,27 +70,36 @@ void strmap_partial_clear(struct strmap *map, int free_values)
 	hashmap_partial_clear(&map->map);
 }
 
+static struct strmap_entry *create_entry(struct strmap *map,
+					 const char *str,
+					 void *data)
+{
+	struct strmap_entry *entry;
+	const char *key = str;
+
+	entry = xmalloc(sizeof(*entry));
+	hashmap_entry_init(&entry->ent, strhash(str));
+
+	if (map->strdup_strings)
+		key = xstrdup(str);
+	entry->key = key;
+	entry->value = data;
+	return entry;
+}
+
 void *strmap_put(struct strmap *map, const char *str, void *data)
 {
 	struct strmap_entry *entry = find_strmap_entry(map, str);
-	void *old = NULL;
 
 	if (entry) {
-		old = entry->value;
+		void *old = entry->value;
 		entry->value = data;
-	} else {
-		const char *key = str;
-
-		entry = xmalloc(sizeof(*entry));
-		hashmap_entry_init(&entry->ent, strhash(str));
-
-		if (map->strdup_strings)
-			key = xstrdup(str);
-		entry->key = key;
-		entry->value = data;
-		hashmap_add(&map->map, &entry->ent);
+		return old;
 	}
-	return old;
+
+	entry = create_entry(map, str, data);
+	hashmap_add(&map->map, &entry->ent);
+	return NULL;
 }
 
 struct strmap_entry *strmap_get_entry(struct strmap *map, const char *str)
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v5 11/15] strmap: add a strset sub-type
  2020-11-06  0:24       ` [PATCH v5 00/15] " Elijah Newren via GitGitGadget
                           ` (9 preceding siblings ...)
  2020-11-06  0:24         ` [PATCH v5 10/15] strmap: split create_entry() out of strmap_put() Elijah Newren via GitGitGadget
@ 2020-11-06  0:24         ` Elijah Newren via GitGitGadget
  2020-11-06  0:24         ` [PATCH v5 12/15] strmap: enable allocations to come from a mem_pool Elijah Newren via GitGitGadget
                           ` (5 subsequent siblings)
  16 siblings, 0 replies; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-06  0:24 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Similar to adding strintmap for special-casing a string -> int mapping,
add a strset type for cases where we really are only interested in using
strmap for storing a set rather than a mapping.  In this case, we'll
always just store NULL for the value but the different struct type makes
it clearer than code comments how a variable is intended to be used.

The difference in usage also results in some differences in API: a few
things that aren't necessary or meaningful are dropped (namely, the
free_values argument to *_clear(), and the *_get() function), and
strset_add() is chosen as the API instead of strset_put().

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 strmap.c | 17 +++++++++++++++
 strmap.h | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 80 insertions(+)

diff --git a/strmap.c b/strmap.c
index dc84c57c07..3784865745 100644
--- a/strmap.c
+++ b/strmap.c
@@ -143,3 +143,20 @@ void strintmap_incr(struct strintmap *map, const char *str, intptr_t amt)
 	else
 		strintmap_set(map, str, map->default_value + amt);
 }
+
+int strset_add(struct strset *set, const char *str)
+{
+	/*
+	 * Cannot use strmap_put() because it'll return NULL in both cases:
+	 *   - cannot find str: NULL means "not found"
+	 *   - does find str: NULL is the value associated with str
+	 */
+	struct strmap_entry *entry = find_strmap_entry(&set->map, str);
+
+	if (entry)
+		return 0;
+
+	entry = create_entry(&set->map, str, NULL);
+	hashmap_add(&set->map.map, &entry->ent);
+	return 1;
+}
diff --git a/strmap.h b/strmap.h
index 56a5cdb864..c8c4d7c932 100644
--- a/strmap.h
+++ b/strmap.h
@@ -27,6 +27,7 @@ int cmp_strmap_entry(const void *hashmap_cmp_fn_data,
 			.map = STRMAP_INIT,   \
 			.default_value = 0,   \
 		       }
+#define STRSET_INIT { .map = STRMAP_INIT }
 
 /*
  * Initialize the members of the strmap.  Any keys added to the strmap will
@@ -196,4 +197,66 @@ static inline void strintmap_set(struct strintmap *map, const char *str,
  */
 void strintmap_incr(struct strintmap *map, const char *str, intptr_t amt);
 
+/*
+ * strset:
+ *    A set of strings.
+ *
+ * Primary differences with strmap:
+ *    1) The value is always NULL, and ignored.  As there is no value to free,
+ *       there is one fewer argument to strset_clear
+ *    2) No strset_get() because there is no value.
+ *    3) No strset_put(); use strset_add() instead.
+ */
+
+struct strset {
+	struct strmap map;
+};
+
+#define strset_for_each_entry(mystrset, iter, var)	\
+	strmap_for_each_entry(&(mystrset)->map, iter, var)
+
+static inline void strset_init(struct strset *set)
+{
+	strmap_init(&set->map);
+}
+
+static inline void strset_init_with_options(struct strset *set,
+					    int strdup_strings)
+{
+	strmap_init_with_options(&set->map, strdup_strings);
+}
+
+static inline void strset_clear(struct strset *set)
+{
+	strmap_clear(&set->map, 0);
+}
+
+static inline void strset_partial_clear(struct strset *set)
+{
+	strmap_partial_clear(&set->map, 0);
+}
+
+static inline int strset_contains(struct strset *set, const char *str)
+{
+	return strmap_contains(&set->map, str);
+}
+
+static inline void strset_remove(struct strset *set, const char *str)
+{
+	return strmap_remove(&set->map, str, 0);
+}
+
+static inline int strset_empty(struct strset *set)
+{
+	return strmap_empty(&set->map);
+}
+
+static inline unsigned int strset_get_size(struct strset *set)
+{
+	return strmap_get_size(&set->map);
+}
+
+/* Returns 1 if str is added to the set; returns 0 if str was already in set */
+int strset_add(struct strset *set, const char *str);
+
 #endif /* STRMAP_H */
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v5 12/15] strmap: enable allocations to come from a mem_pool
  2020-11-06  0:24       ` [PATCH v5 00/15] " Elijah Newren via GitGitGadget
                           ` (10 preceding siblings ...)
  2020-11-06  0:24         ` [PATCH v5 11/15] strmap: add a strset sub-type Elijah Newren via GitGitGadget
@ 2020-11-06  0:24         ` Elijah Newren via GitGitGadget
  2020-11-11 17:33           ` Phillip Wood
  2020-11-06  0:24         ` [PATCH v5 13/15] strmap: take advantage of FLEXPTR_ALLOC_STR when relevant Elijah Newren via GitGitGadget
                           ` (4 subsequent siblings)
  16 siblings, 1 reply; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-06  0:24 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

For heavy users of strmaps, allowing the keys and entries to be
allocated from a memory pool can provide significant overhead savings.
Add an option to strmap_init_with_options() to specify a memory pool.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 strmap.c | 31 ++++++++++++++++++++++---------
 strmap.h | 11 ++++++++---
 2 files changed, 30 insertions(+), 12 deletions(-)

diff --git a/strmap.c b/strmap.c
index 3784865745..139afb9d4b 100644
--- a/strmap.c
+++ b/strmap.c
@@ -1,5 +1,6 @@
 #include "git-compat-util.h"
 #include "strmap.h"
+#include "mem-pool.h"
 
 int cmp_strmap_entry(const void *hashmap_cmp_fn_data,
 		     const struct hashmap_entry *entry1,
@@ -24,13 +25,15 @@ static struct strmap_entry *find_strmap_entry(struct strmap *map,
 
 void strmap_init(struct strmap *map)
 {
-	strmap_init_with_options(map, 1);
+	strmap_init_with_options(map, NULL, 1);
 }
 
 void strmap_init_with_options(struct strmap *map,
+			      struct mem_pool *pool,
 			      int strdup_strings)
 {
 	hashmap_init(&map->map, cmp_strmap_entry, NULL, 0);
+	map->pool = pool;
 	map->strdup_strings = strdup_strings;
 }
 
@@ -42,6 +45,10 @@ static void strmap_free_entries_(struct strmap *map, int free_values)
 	if (!map)
 		return;
 
+	if (!free_values && map->pool)
+		/* Memory other than util is owned by and freed with the pool */
+		return;
+
 	/*
 	 * We need to iterate over the hashmap entries and free
 	 * e->key and e->value ourselves; hashmap has no API to
@@ -52,9 +59,11 @@ static void strmap_free_entries_(struct strmap *map, int free_values)
 	hashmap_for_each_entry(&map->map, &iter, e, ent) {
 		if (free_values)
 			free(e->value);
-		if (map->strdup_strings)
-			free((char*)e->key);
-		free(e);
+		if (!map->pool) {
+			if (map->strdup_strings)
+				free((char*)e->key);
+			free(e);
+		}
 	}
 }
 
@@ -77,11 +86,13 @@ static struct strmap_entry *create_entry(struct strmap *map,
 	struct strmap_entry *entry;
 	const char *key = str;
 
-	entry = xmalloc(sizeof(*entry));
+	entry = map->pool ? mem_pool_alloc(map->pool, sizeof(*entry))
+			  : xmalloc(sizeof(*entry));
 	hashmap_entry_init(&entry->ent, strhash(str));
 
 	if (map->strdup_strings)
-		key = xstrdup(str);
+		key = map->pool ? mem_pool_strdup(map->pool, str)
+				: xstrdup(str);
 	entry->key = key;
 	entry->value = data;
 	return entry;
@@ -128,9 +139,11 @@ void strmap_remove(struct strmap *map, const char *str, int free_value)
 		return;
 	if (free_value)
 		free(ret->value);
-	if (map->strdup_strings)
-		free((char*)ret->key);
-	free(ret);
+	if (!map->pool) {
+		if (map->strdup_strings)
+			free((char*)ret->key);
+		free(ret);
+	}
 }
 
 void strintmap_incr(struct strintmap *map, const char *str, intptr_t amt)
diff --git a/strmap.h b/strmap.h
index c8c4d7c932..dda928703d 100644
--- a/strmap.h
+++ b/strmap.h
@@ -3,8 +3,10 @@
 
 #include "hashmap.h"
 
+struct mempool;
 struct strmap {
 	struct hashmap map;
+	struct mem_pool *pool;
 	unsigned int strdup_strings:1;
 };
 
@@ -37,9 +39,10 @@ void strmap_init(struct strmap *map);
 
 /*
  * Same as strmap_init, but for those who want to control the memory management
- * carefully instead of using the default of strdup_strings=1.
+ * carefully instead of using the default of strdup_strings=1 and pool=NULL.
  */
 void strmap_init_with_options(struct strmap *map,
+			      struct mem_pool *pool,
 			      int strdup_strings);
 
 /*
@@ -137,9 +140,10 @@ static inline void strintmap_init(struct strintmap *map, int default_value)
 
 static inline void strintmap_init_with_options(struct strintmap *map,
 					       int default_value,
+					       struct mem_pool *pool,
 					       int strdup_strings)
 {
-	strmap_init_with_options(&map->map, strdup_strings);
+	strmap_init_with_options(&map->map, pool, strdup_strings);
 	map->default_value = default_value;
 }
 
@@ -221,9 +225,10 @@ static inline void strset_init(struct strset *set)
 }
 
 static inline void strset_init_with_options(struct strset *set,
+					    struct mem_pool *pool,
 					    int strdup_strings)
 {
-	strmap_init_with_options(&set->map, strdup_strings);
+	strmap_init_with_options(&set->map, pool, strdup_strings);
 }
 
 static inline void strset_clear(struct strset *set)
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v5 13/15] strmap: take advantage of FLEXPTR_ALLOC_STR when relevant
  2020-11-06  0:24       ` [PATCH v5 00/15] " Elijah Newren via GitGitGadget
                           ` (11 preceding siblings ...)
  2020-11-06  0:24         ` [PATCH v5 12/15] strmap: enable allocations to come from a mem_pool Elijah Newren via GitGitGadget
@ 2020-11-06  0:24         ` Elijah Newren via GitGitGadget
  2020-11-06  0:24         ` [PATCH v5 14/15] Use new HASHMAP_INIT macro to simplify hashmap initialization Elijah Newren via GitGitGadget
                           ` (3 subsequent siblings)
  16 siblings, 0 replies; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-06  0:24 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

By default, we do not use a mempool and strdup_strings is true; in this
case, we can avoid both an extra allocation and an extra free by just
over-allocating for the strmap_entry leaving enough space at the end to
copy the key.  FLEXPTR_ALLOC_STR exists for exactly this purpose, so
make use of it.

Also, adjust the case when we are using a memory pool and strdup_strings
is true to just do one allocation from the memory pool instead of two so
that the strmap_clear() and strmap_remove() code can just avoid freeing
the key in all cases.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 strmap.c | 35 +++++++++++++++++++----------------
 strmap.h |  1 +
 2 files changed, 20 insertions(+), 16 deletions(-)

diff --git a/strmap.c b/strmap.c
index 139afb9d4b..4fb9f6100e 100644
--- a/strmap.c
+++ b/strmap.c
@@ -59,11 +59,8 @@ static void strmap_free_entries_(struct strmap *map, int free_values)
 	hashmap_for_each_entry(&map->map, &iter, e, ent) {
 		if (free_values)
 			free(e->value);
-		if (!map->pool) {
-			if (map->strdup_strings)
-				free((char*)e->key);
+		if (!map->pool)
 			free(e);
-		}
 	}
 }
 
@@ -84,16 +81,25 @@ static struct strmap_entry *create_entry(struct strmap *map,
 					 void *data)
 {
 	struct strmap_entry *entry;
-	const char *key = str;
 
-	entry = map->pool ? mem_pool_alloc(map->pool, sizeof(*entry))
-			  : xmalloc(sizeof(*entry));
+	if (map->strdup_strings) {
+		if (!map->pool) {
+			FLEXPTR_ALLOC_STR(entry, key, str);
+		} else {
+			size_t len = st_add(strlen(str), 1); /* include NUL */
+			entry = mem_pool_alloc(map->pool,
+					       st_add(sizeof(*entry), len));
+			memcpy(entry + 1, str, len);
+			entry->key = (void *)(entry + 1);
+		}
+	} else if (!map->pool) {
+		entry = xmalloc(sizeof(*entry));
+	} else {
+		entry = mem_pool_alloc(map->pool, sizeof(*entry));
+	}
 	hashmap_entry_init(&entry->ent, strhash(str));
-
-	if (map->strdup_strings)
-		key = map->pool ? mem_pool_strdup(map->pool, str)
-				: xstrdup(str);
-	entry->key = key;
+	if (!map->strdup_strings)
+		entry->key = str;
 	entry->value = data;
 	return entry;
 }
@@ -139,11 +145,8 @@ void strmap_remove(struct strmap *map, const char *str, int free_value)
 		return;
 	if (free_value)
 		free(ret->value);
-	if (!map->pool) {
-		if (map->strdup_strings)
-			free((char*)ret->key);
+	if (!map->pool)
 		free(ret);
-	}
 }
 
 void strintmap_incr(struct strintmap *map, const char *str, intptr_t amt)
diff --git a/strmap.h b/strmap.h
index dda928703d..a99011df25 100644
--- a/strmap.h
+++ b/strmap.h
@@ -14,6 +14,7 @@ struct strmap_entry {
 	struct hashmap_entry ent;
 	const char *key;
 	void *value;
+	/* strmap_entry may be allocated extra space to store the key at end */
 };
 
 int cmp_strmap_entry(const void *hashmap_cmp_fn_data,
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v5 14/15] Use new HASHMAP_INIT macro to simplify hashmap initialization
  2020-11-06  0:24       ` [PATCH v5 00/15] " Elijah Newren via GitGitGadget
                           ` (12 preceding siblings ...)
  2020-11-06  0:24         ` [PATCH v5 13/15] strmap: take advantage of FLEXPTR_ALLOC_STR when relevant Elijah Newren via GitGitGadget
@ 2020-11-06  0:24         ` Elijah Newren via GitGitGadget
  2020-11-06  0:24         ` [PATCH v5 15/15] shortlog: use strset from strmap.h Elijah Newren via GitGitGadget
                           ` (2 subsequent siblings)
  16 siblings, 0 replies; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-06  0:24 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Now that hashamp has lazy initialization and a HASHMAP_INIT macro,
hashmaps allocated on the stack can be initialized without a call to
hashmap_init() and in some cases makes the code a bit shorter.  Convert
some callsites over to take advantage of this.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 attr.c                  | 26 ++++++++------------------
 bloom.c                 |  3 +--
 builtin/difftool.c      |  9 ++++-----
 range-diff.c            |  4 +---
 revision.c              |  9 +--------
 t/helper/test-hashmap.c |  3 +--
 6 files changed, 16 insertions(+), 38 deletions(-)

diff --git a/attr.c b/attr.c
index a826b2ef1f..4ef85d668b 100644
--- a/attr.c
+++ b/attr.c
@@ -52,13 +52,6 @@ static inline void hashmap_unlock(struct attr_hashmap *map)
 	pthread_mutex_unlock(&map->mutex);
 }
 
-/*
- * The global dictionary of all interned attributes.  This
- * is a singleton object which is shared between threads.
- * Access to this dictionary must be surrounded with a mutex.
- */
-static struct attr_hashmap g_attr_hashmap;
-
 /* The container for objects stored in "struct attr_hashmap" */
 struct attr_hash_entry {
 	struct hashmap_entry ent;
@@ -80,11 +73,14 @@ static int attr_hash_entry_cmp(const void *unused_cmp_data,
 	return (a->keylen != b->keylen) || strncmp(a->key, b->key, a->keylen);
 }
 
-/* Initialize an 'attr_hashmap' object */
-static void attr_hashmap_init(struct attr_hashmap *map)
-{
-	hashmap_init(&map->map, attr_hash_entry_cmp, NULL, 0);
-}
+/*
+ * The global dictionary of all interned attributes.  This
+ * is a singleton object which is shared between threads.
+ * Access to this dictionary must be surrounded with a mutex.
+ */
+static struct attr_hashmap g_attr_hashmap = {
+	HASHMAP_INIT(attr_hash_entry_cmp, NULL)
+};
 
 /*
  * Retrieve the 'value' stored in a hashmap given the provided 'key'.
@@ -96,9 +92,6 @@ static void *attr_hashmap_get(struct attr_hashmap *map,
 	struct attr_hash_entry k;
 	struct attr_hash_entry *e;
 
-	if (!map->map.tablesize)
-		attr_hashmap_init(map);
-
 	hashmap_entry_init(&k.ent, memhash(key, keylen));
 	k.key = key;
 	k.keylen = keylen;
@@ -114,9 +107,6 @@ static void attr_hashmap_add(struct attr_hashmap *map,
 {
 	struct attr_hash_entry *e;
 
-	if (!map->map.tablesize)
-		attr_hashmap_init(map);
-
 	e = xmalloc(sizeof(struct attr_hash_entry));
 	hashmap_entry_init(&e->ent, memhash(key, keylen));
 	e->key = key;
diff --git a/bloom.c b/bloom.c
index 719c313a1c..b176f28f53 100644
--- a/bloom.c
+++ b/bloom.c
@@ -229,10 +229,9 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r,
 	diffcore_std(&diffopt);
 
 	if (diff_queued_diff.nr <= settings->max_changed_paths) {
-		struct hashmap pathmap;
+		struct hashmap pathmap = HASHMAP_INIT(pathmap_cmp, NULL);
 		struct pathmap_hash_entry *e;
 		struct hashmap_iter iter;
-		hashmap_init(&pathmap, pathmap_cmp, NULL, 0);
 
 		for (i = 0; i < diff_queued_diff.nr; i++) {
 			const char *path = diff_queued_diff.queue[i]->two->path;
diff --git a/builtin/difftool.c b/builtin/difftool.c
index 7ac432b881..6e18e623fd 100644
--- a/builtin/difftool.c
+++ b/builtin/difftool.c
@@ -342,7 +342,10 @@ static int run_dir_diff(const char *extcmd, int symlinks, const char *prefix,
 	const char *workdir, *tmp;
 	int ret = 0, i;
 	FILE *fp;
-	struct hashmap working_tree_dups, submodules, symlinks2;
+	struct hashmap working_tree_dups = HASHMAP_INIT(working_tree_entry_cmp,
+							NULL);
+	struct hashmap submodules = HASHMAP_INIT(pair_cmp, NULL);
+	struct hashmap symlinks2 = HASHMAP_INIT(pair_cmp, NULL);
 	struct hashmap_iter iter;
 	struct pair_entry *entry;
 	struct index_state wtindex;
@@ -383,10 +386,6 @@ static int run_dir_diff(const char *extcmd, int symlinks, const char *prefix,
 	rdir_len = rdir.len;
 	wtdir_len = wtdir.len;
 
-	hashmap_init(&working_tree_dups, working_tree_entry_cmp, NULL, 0);
-	hashmap_init(&submodules, pair_cmp, NULL, 0);
-	hashmap_init(&symlinks2, pair_cmp, NULL, 0);
-
 	child.no_stdin = 1;
 	child.git_cmd = 1;
 	child.use_shell = 0;
diff --git a/range-diff.c b/range-diff.c
index befeecae44..b9950f10c8 100644
--- a/range-diff.c
+++ b/range-diff.c
@@ -232,11 +232,9 @@ static int patch_util_cmp(const void *dummy, const struct patch_util *a,
 
 static void find_exact_matches(struct string_list *a, struct string_list *b)
 {
-	struct hashmap map;
+	struct hashmap map = HASHMAP_INIT((hashmap_cmp_fn)patch_util_cmp, NULL);
 	int i;
 
-	hashmap_init(&map, (hashmap_cmp_fn)patch_util_cmp, NULL, 0);
-
 	/* First, add the patches of a to a hash map */
 	for (i = 0; i < a->nr; i++) {
 		struct patch_util *util = a->items[i].util;
diff --git a/revision.c b/revision.c
index f27649d45d..c6e169e3eb 100644
--- a/revision.c
+++ b/revision.c
@@ -124,11 +124,6 @@ static int path_and_oids_cmp(const void *hashmap_cmp_fn_data,
 	return strcmp(e1->path, e2->path);
 }
 
-static void paths_and_oids_init(struct hashmap *map)
-{
-	hashmap_init(map, path_and_oids_cmp, NULL, 0);
-}
-
 static void paths_and_oids_clear(struct hashmap *map)
 {
 	struct hashmap_iter iter;
@@ -213,7 +208,7 @@ void mark_trees_uninteresting_sparse(struct repository *r,
 				     struct oidset *trees)
 {
 	unsigned has_interesting = 0, has_uninteresting = 0;
-	struct hashmap map;
+	struct hashmap map = HASHMAP_INIT(path_and_oids_cmp, NULL);
 	struct hashmap_iter map_iter;
 	struct path_and_oids_entry *entry;
 	struct object_id *oid;
@@ -237,8 +232,6 @@ void mark_trees_uninteresting_sparse(struct repository *r,
 	if (!has_uninteresting || !has_interesting)
 		return;
 
-	paths_and_oids_init(&map);
-
 	oidset_iter_init(trees, &iter);
 	while ((oid = oidset_iter_next(&iter))) {
 		struct tree *tree = lookup_tree(r, oid);
diff --git a/t/helper/test-hashmap.c b/t/helper/test-hashmap.c
index 2475663b49..36ff07bd4b 100644
--- a/t/helper/test-hashmap.c
+++ b/t/helper/test-hashmap.c
@@ -151,12 +151,11 @@ static void perf_hashmap(unsigned int method, unsigned int rounds)
 int cmd__hashmap(int argc, const char **argv)
 {
 	struct strbuf line = STRBUF_INIT;
-	struct hashmap map;
 	int icase;
+	struct hashmap map = HASHMAP_INIT(test_entry_cmp, &icase);
 
 	/* init hash map */
 	icase = argc > 1 && !strcmp("ignorecase", argv[1]);
-	hashmap_init(&map, test_entry_cmp, &icase, 0);
 
 	/* process commands from stdin */
 	while (strbuf_getline(&line, stdin) != EOF) {
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v5 15/15] shortlog: use strset from strmap.h
  2020-11-06  0:24       ` [PATCH v5 00/15] " Elijah Newren via GitGitGadget
                           ` (13 preceding siblings ...)
  2020-11-06  0:24         ` [PATCH v5 14/15] Use new HASHMAP_INIT macro to simplify hashmap initialization Elijah Newren via GitGitGadget
@ 2020-11-06  0:24         ` Elijah Newren via GitGitGadget
  2020-11-06  2:00         ` [PATCH v5 00/15] Add struct strmap and associated utility functions Junio C Hamano
  2020-11-11 20:02         ` [PATCH v6 " Elijah Newren via GitGitGadget
  16 siblings, 0 replies; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-06  0:24 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 builtin/shortlog.c | 61 +++-------------------------------------------
 1 file changed, 4 insertions(+), 57 deletions(-)

diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index 83f0a739b4..c52e4ccd19 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -10,6 +10,7 @@
 #include "shortlog.h"
 #include "parse-options.h"
 #include "trailer.h"
+#include "strmap.h"
 
 static char const * const shortlog_usage[] = {
 	N_("git shortlog [<options>] [<revision-range>] [[--] <path>...]"),
@@ -169,60 +170,6 @@ static void read_from_stdin(struct shortlog *log)
 	strbuf_release(&oneline);
 }
 
-struct strset_item {
-	struct hashmap_entry ent;
-	char value[FLEX_ARRAY];
-};
-
-struct strset {
-	struct hashmap map;
-};
-
-#define STRSET_INIT { { NULL } }
-
-static int strset_item_hashcmp(const void *hash_data,
-			       const struct hashmap_entry *entry,
-			       const struct hashmap_entry *entry_or_key,
-			       const void *keydata)
-{
-	const struct strset_item *a, *b;
-
-	a = container_of(entry, const struct strset_item, ent);
-	if (keydata)
-		return strcmp(a->value, keydata);
-
-	b = container_of(entry_or_key, const struct strset_item, ent);
-	return strcmp(a->value, b->value);
-}
-
-/*
- * Adds "str" to the set if it was not already present; returns true if it was
- * already there.
- */
-static int strset_check_and_add(struct strset *ss, const char *str)
-{
-	unsigned int hash = strhash(str);
-	struct strset_item *item;
-
-	if (!ss->map.table)
-		hashmap_init(&ss->map, strset_item_hashcmp, NULL, 0);
-
-	if (hashmap_get_from_hash(&ss->map, hash, str))
-		return 1;
-
-	FLEX_ALLOC_STR(item, value, str);
-	hashmap_entry_init(&item->ent, hash);
-	hashmap_add(&ss->map, &item->ent);
-	return 0;
-}
-
-static void strset_clear(struct strset *ss)
-{
-	if (!ss->map.table)
-		return;
-	hashmap_clear_and_free(&ss->map, struct strset_item, ent);
-}
-
 static void insert_records_from_trailers(struct shortlog *log,
 					 struct strset *dups,
 					 struct commit *commit,
@@ -253,7 +200,7 @@ static void insert_records_from_trailers(struct shortlog *log,
 		if (!parse_ident(log, &ident, value))
 			value = ident.buf;
 
-		if (strset_check_and_add(dups, value))
+		if (!strset_add(dups, value))
 			continue;
 		insert_one_record(log, value, oneline);
 	}
@@ -291,7 +238,7 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 				      log->email ? "%aN <%aE>" : "%aN",
 				      &ident, &ctx);
 		if (!HAS_MULTI_BITS(log->groups) ||
-		    !strset_check_and_add(&dups, ident.buf))
+		    strset_add(&dups, ident.buf))
 			insert_one_record(log, ident.buf, oneline_str);
 	}
 	if (log->groups & SHORTLOG_GROUP_COMMITTER) {
@@ -300,7 +247,7 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 				      log->email ? "%cN <%cE>" : "%cN",
 				      &ident, &ctx);
 		if (!HAS_MULTI_BITS(log->groups) ||
-		    !strset_check_and_add(&dups, ident.buf))
+		    strset_add(&dups, ident.buf))
 			insert_one_record(log, ident.buf, oneline_str);
 	}
 	if (log->groups & SHORTLOG_GROUP_TRAILER) {
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v5 00/15] Add struct strmap and associated utility functions
  2020-11-06  0:24       ` [PATCH v5 00/15] " Elijah Newren via GitGitGadget
                           ` (14 preceding siblings ...)
  2020-11-06  0:24         ` [PATCH v5 15/15] shortlog: use strset from strmap.h Elijah Newren via GitGitGadget
@ 2020-11-06  2:00         ` Junio C Hamano
  2020-11-06  2:42           ` Elijah Newren
  2020-11-11 20:02         ` [PATCH v6 " Elijah Newren via GitGitGadget
  16 siblings, 1 reply; 144+ messages in thread
From: Junio C Hamano @ 2020-11-06  2:00 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget; +Cc: git, Jeff King, Elijah Newren

"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:

> Changes since v4:
> ...
>  * Add a patch which updates shortlog to use the new strset API.

This makes my life so much simpler ;-)

Would the implementation be very different from Peff's that you can
take the authorship?  Thanks.

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v5 00/15] Add struct strmap and associated utility functions
  2020-11-06  2:00         ` [PATCH v5 00/15] Add struct strmap and associated utility functions Junio C Hamano
@ 2020-11-06  2:42           ` Elijah Newren
  2020-11-06  2:48             ` Jeff King
  0 siblings, 1 reply; 144+ messages in thread
From: Elijah Newren @ 2020-11-06  2:42 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Jeff King

On Thu, Nov 5, 2020 at 6:00 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > Changes since v4:
> > ...
> >  * Add a patch which updates shortlog to use the new strset API.
>
> This makes my life so much simpler ;-)
>
> Would the implementation be very different from Peff's that you can
> take the authorship?  Thanks.

Yes; I didn't use his patch, I simply implemented what was needed from
scratch.  I'm not attached to being author of this though; the changes
were trivial.  Feel free to change as you see fit.


If more detail is needed...

There's only two things in my patch: (1) deleting a bunch of code, (2)
search and replace strset_check_and_add() with !strset_add().

His patch has three things: (1) deleting a bunch of code, (2)
introducing strset_dup() [which may have been a copy of my
implementation of strset_check_and_add() from an earlier round of the
series; the code is identical to my implementation, but it's only a
few lines so he might have just reimplemented it identically], (3)
search and replace strset_check_and_add() with strset_dup().

If I were to modify his patch into mine (which I didn't do), it'd
require two things: deleting the strdup() definition and still doing a
search and replace.  In other words, it'd be approximately equivalent
work to just doing the patch from scratch.

Further, I wrote a patch that was nearly the same as my current
submission a few days ago, but it used my old strset_check_and_add().
It triggered some weird windows bug that I think was an infrastructure
flake, but I was worried at the time that it'd require familiarity
with shortlog and its tests to address.  Since I didn't think my
series really depended on that change (shortlog could change to take
advantage of the new strset later), I just dropped it.  Then after
further reviews, the series changed a bit more, and Peff at the end
added a patch to reintroduce strset_check_and_add() with a different
name and use it, then you suggested to modify strset_add() so it can
just be used directly.

So, at the end, taking my existing patch that pre-dated his submission
and tweaking it was the easiest route for me.  I didn't actually look
at his latest patch until after you asked if it was okay for me to
take the authorship.  I see it as two similar from-scratch
implementations that were nearly trivial in either event.

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v5 00/15] Add struct strmap and associated utility functions
  2020-11-06  2:42           ` Elijah Newren
@ 2020-11-06  2:48             ` Jeff King
  2020-11-06 17:32               ` Junio C Hamano
  0 siblings, 1 reply; 144+ messages in thread
From: Jeff King @ 2020-11-06  2:48 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Junio C Hamano, Elijah Newren via GitGitGadget, Git Mailing List

On Thu, Nov 05, 2020 at 06:42:38PM -0800, Elijah Newren wrote:

> > > Changes since v4:
> > > ...
> > >  * Add a patch which updates shortlog to use the new strset API.
> >
> > This makes my life so much simpler ;-)
> >
> > Would the implementation be very different from Peff's that you can
> > take the authorship?  Thanks.
> 
> Yes; I didn't use his patch, I simply implemented what was needed from
> scratch.  I'm not attached to being author of this though; the changes
> were trivial.  Feel free to change as you see fit.

Yeah, I am fine either way with the authorship here. The patch is
trivial, and I was pretty sure you had written the same or similar
already. My main point in posting it was to push it over the finish line
so we didn't forget. ;)

-Peff

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v5 00/15] Add struct strmap and associated utility functions
  2020-11-06  2:48             ` Jeff King
@ 2020-11-06 17:32               ` Junio C Hamano
  0 siblings, 0 replies; 144+ messages in thread
From: Junio C Hamano @ 2020-11-06 17:32 UTC (permalink / raw)
  To: Jeff King; +Cc: Elijah Newren, Elijah Newren via GitGitGadget, Git Mailing List

Jeff King <peff@peff.net> writes:

> On Thu, Nov 05, 2020 at 06:42:38PM -0800, Elijah Newren wrote:
>
>> > > Changes since v4:
>> > > ...
>> > >  * Add a patch which updates shortlog to use the new strset API.
>> >
>> > This makes my life so much simpler ;-)
>> >
>> > Would the implementation be very different from Peff's that you can
>> > take the authorship?  Thanks.
>> 
>> Yes; I didn't use his patch, I simply implemented what was needed from
>> scratch.  I'm not attached to being author of this though; the changes
>> were trivial.  Feel free to change as you see fit.
>
> Yeah, I am fine either way with the authorship here. The patch is
> trivial, and I was pretty sure you had written the same or similar
> already. My main point in posting it was to push it over the finish line
> so we didn't forget. ;)

Yes.  I just was double-checking in case Elijah forgot, as I
couldn't tell if it was deliberate.  The way you two agree on
is the best for me, too.

Thanks.

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v5 12/15] strmap: enable allocations to come from a mem_pool
  2020-11-06  0:24         ` [PATCH v5 12/15] strmap: enable allocations to come from a mem_pool Elijah Newren via GitGitGadget
@ 2020-11-11 17:33           ` Phillip Wood
  2020-11-11 18:49             ` Elijah Newren
  2020-11-11 19:01             ` Jeff King
  0 siblings, 2 replies; 144+ messages in thread
From: Phillip Wood @ 2020-11-11 17:33 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget, git; +Cc: Jeff King, Elijah Newren

Hi Elijah

On 06/11/2020 00:24, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <newren@gmail.com>
> 
> For heavy users of strmaps, allowing the keys and entries to be
> allocated from a memory pool can provide significant overhead savings.
> Add an option to strmap_init_with_options() to specify a memory pool.
> [...]
> diff --git a/strmap.h b/strmap.h
> index c8c4d7c932..dda928703d 100644
> --- a/strmap.h
> +++ b/strmap.h
> @@ -3,8 +3,10 @@
>   
>   #include "hashmap.h"
>   
> +struct mempool;

I think this is a typo - I assume you wanted to declare `struct 
mem_pool` but it's not strictly necessary as you're only adding a 
pointer to the struct below.

Best Wishes

Phillip

>   struct strmap {
>   	struct hashmap map;
> +	struct mem_pool *pool;
>   	unsigned int strdup_strings:1;
>   };
>   
> @@ -37,9 +39,10 @@ void strmap_init(struct strmap *map);
>   
>   /*
>    * Same as strmap_init, but for those who want to control the memory management
> - * carefully instead of using the default of strdup_strings=1.
> + * carefully instead of using the default of strdup_strings=1 and pool=NULL.
>    */
>   void strmap_init_with_options(struct strmap *map,
> +			      struct mem_pool *pool,
>   			      int strdup_strings);
>   
>   /*
> @@ -137,9 +140,10 @@ static inline void strintmap_init(struct strintmap *map, int default_value)
>   
>   static inline void strintmap_init_with_options(struct strintmap *map,
>   					       int default_value,
> +					       struct mem_pool *pool,
>   					       int strdup_strings)
>   {
> -	strmap_init_with_options(&map->map, strdup_strings);
> +	strmap_init_with_options(&map->map, pool, strdup_strings);
>   	map->default_value = default_value;
>   }
>   
> @@ -221,9 +225,10 @@ static inline void strset_init(struct strset *set)
>   }
>   
>   static inline void strset_init_with_options(struct strset *set,
> +					    struct mem_pool *pool,
>   					    int strdup_strings)
>   {
> -	strmap_init_with_options(&set->map, strdup_strings);
> +	strmap_init_with_options(&set->map, pool, strdup_strings);
>   }
>   
>   static inline void strset_clear(struct strset *set)
> 

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v5 12/15] strmap: enable allocations to come from a mem_pool
  2020-11-11 17:33           ` Phillip Wood
@ 2020-11-11 18:49             ` Elijah Newren
  2020-11-11 19:01             ` Jeff King
  1 sibling, 0 replies; 144+ messages in thread
From: Elijah Newren @ 2020-11-11 18:49 UTC (permalink / raw)
  To: Phillip Wood; +Cc: Elijah Newren via GitGitGadget, Git Mailing List, Jeff King

On Wed, Nov 11, 2020 at 9:33 AM Phillip Wood <phillip.wood123@gmail.com> wrote:
>
> Hi Elijah
>
> On 06/11/2020 00:24, Elijah Newren via GitGitGadget wrote:
> > From: Elijah Newren <newren@gmail.com>
> >
> > For heavy users of strmaps, allowing the keys and entries to be
> > allocated from a memory pool can provide significant overhead savings.
> > Add an option to strmap_init_with_options() to specify a memory pool.
> > [...]
> > diff --git a/strmap.h b/strmap.h
> > index c8c4d7c932..dda928703d 100644
> > --- a/strmap.h
> > +++ b/strmap.h
> > @@ -3,8 +3,10 @@
> >
> >   #include "hashmap.h"
> >
> > +struct mempool;
>
> I think this is a typo - I assume you wanted to declare `struct
> mem_pool` but it's not strictly necessary as you're only adding a
> pointer to the struct below.
>
> Best Wishes
>
> Phillip

Indeed, thanks.

>
> >   struct strmap {
> >       struct hashmap map;
> > +     struct mem_pool *pool;
> >       unsigned int strdup_strings:1;
> >   };
> >
> > @@ -37,9 +39,10 @@ void strmap_init(struct strmap *map);
> >
> >   /*
> >    * Same as strmap_init, but for those who want to control the memory management
> > - * carefully instead of using the default of strdup_strings=1.
> > + * carefully instead of using the default of strdup_strings=1 and pool=NULL.
> >    */
> >   void strmap_init_with_options(struct strmap *map,
> > +                           struct mem_pool *pool,
> >                             int strdup_strings);
> >
> >   /*
> > @@ -137,9 +140,10 @@ static inline void strintmap_init(struct strintmap *map, int default_value)
> >
> >   static inline void strintmap_init_with_options(struct strintmap *map,
> >                                              int default_value,
> > +                                            struct mem_pool *pool,
> >                                              int strdup_strings)
> >   {
> > -     strmap_init_with_options(&map->map, strdup_strings);
> > +     strmap_init_with_options(&map->map, pool, strdup_strings);
> >       map->default_value = default_value;
> >   }
> >
> > @@ -221,9 +225,10 @@ static inline void strset_init(struct strset *set)
> >   }
> >
> >   static inline void strset_init_with_options(struct strset *set,
> > +                                         struct mem_pool *pool,
> >                                           int strdup_strings)
> >   {
> > -     strmap_init_with_options(&set->map, strdup_strings);
> > +     strmap_init_with_options(&set->map, pool, strdup_strings);
> >   }
> >
> >   static inline void strset_clear(struct strset *set)
> >

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v5 12/15] strmap: enable allocations to come from a mem_pool
  2020-11-11 17:33           ` Phillip Wood
  2020-11-11 18:49             ` Elijah Newren
@ 2020-11-11 19:01             ` Jeff King
  2020-11-11 20:34               ` Chris Torek
  1 sibling, 1 reply; 144+ messages in thread
From: Jeff King @ 2020-11-11 19:01 UTC (permalink / raw)
  To: phillip.wood; +Cc: Elijah Newren via GitGitGadget, git, Elijah Newren

On Wed, Nov 11, 2020 at 05:33:47PM +0000, Phillip Wood wrote:

> On 06/11/2020 00:24, Elijah Newren via GitGitGadget wrote:
> > From: Elijah Newren <newren@gmail.com>
> > 
> > For heavy users of strmaps, allowing the keys and entries to be
> > allocated from a memory pool can provide significant overhead savings.
> > Add an option to strmap_init_with_options() to specify a memory pool.
> > [...]
> > diff --git a/strmap.h b/strmap.h
> > index c8c4d7c932..dda928703d 100644
> > --- a/strmap.h
> > +++ b/strmap.h
> > @@ -3,8 +3,10 @@
> >   #include "hashmap.h"
> > +struct mempool;
> 
> I think this is a typo - I assume you wanted to declare `struct mem_pool`
> but it's not strictly necessary as you're only adding a pointer to the
> struct below.

Good catch.

Even if we're only using a pointer to it, we still need a valid forward
declaration (using the pointer only saves us from needing the full
definition). Or so I thought.

It looks like the compiler will treat the use inside the struct:

> >   struct strmap {
> >   	struct hashmap map;
> > +	struct mem_pool *pool;
> >   	unsigned int strdup_strings:1;
> >   };

as an implicit forward declaration, but not the ones inside the function
declarations:

> >   void strmap_init_with_options(struct strmap *map,
> > +			      struct mem_pool *pool,
> >   			      int strdup_strings);

If you replace the pointer in the struct definition with "struct foo",
then "make hdr-check" will complain about mem_pool in the function. And
likewise if you replace the ones in the function with "struct foo", then
we'll complain about those.

I'm not sure whether this is a seldom-seen corner of the C standard, or
a compiler-specific thing (though both clang and gcc seem to allow it).
At any rate, I think it is worth fixing the typo'd forward declaration
(rather than deleting it) to make the intention clear.

-Peff

^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v6 00/15] Add struct strmap and associated utility functions
  2020-11-06  0:24       ` [PATCH v5 00/15] " Elijah Newren via GitGitGadget
                           ` (15 preceding siblings ...)
  2020-11-06  2:00         ` [PATCH v5 00/15] Add struct strmap and associated utility functions Junio C Hamano
@ 2020-11-11 20:02         ` Elijah Newren via GitGitGadget
  2020-11-11 20:02           ` [PATCH v6 01/15] hashmap: add usage documentation explaining hashmap_free[_entries]() Elijah Newren via GitGitGadget
                             ` (15 more replies)
  16 siblings, 16 replies; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-11 20:02 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Phillip Wood, Elijah Newren

Here I introduce new strmap, strintmap, and strset types.

Changes since v5:

 * Fixed a typo in forward declaration of struct mem_pool, spotted by
   Phillip. (Usage via pointers meant gcc & clang wouldn't complain.)

[1] 
https://lore.kernel.org/git/20180906191203.GA26184@sigill.intra.peff.net/

Elijah Newren (15):
  hashmap: add usage documentation explaining hashmap_free[_entries]()
  hashmap: adjust spacing to fix argument alignment
  hashmap: allow re-use after hashmap_free()
  hashmap: introduce a new hashmap_partial_clear()
  hashmap: provide deallocation function names
  strmap: new utility functions
  strmap: add more utility functions
  strmap: enable faster clearing and reusing of strmaps
  strmap: add functions facilitating use as a string->int map
  strmap: split create_entry() out of strmap_put()
  strmap: add a strset sub-type
  strmap: enable allocations to come from a mem_pool
  strmap: take advantage of FLEXPTR_ALLOC_STR when relevant
  Use new HASHMAP_INIT macro to simplify hashmap initialization
  shortlog: use strset from strmap.h

 Makefile                |   1 +
 add-interactive.c       |   2 +-
 attr.c                  |  26 ++--
 blame.c                 |   2 +-
 bloom.c                 |   5 +-
 builtin/difftool.c      |   9 +-
 builtin/fetch.c         |   6 +-
 builtin/shortlog.c      |  61 +--------
 config.c                |   2 +-
 diff.c                  |   4 +-
 diffcore-rename.c       |   2 +-
 dir.c                   |   8 +-
 hashmap.c               |  74 +++++++----
 hashmap.h               |  91 +++++++++++---
 merge-recursive.c       |   6 +-
 name-hash.c             |   4 +-
 object.c                |   2 +-
 oidmap.c                |   2 +-
 patch-ids.c             |   2 +-
 range-diff.c            |   6 +-
 ref-filter.c            |   2 +-
 revision.c              |  11 +-
 sequencer.c             |   4 +-
 strmap.c                | 178 ++++++++++++++++++++++++++
 strmap.h                | 268 ++++++++++++++++++++++++++++++++++++++++
 submodule-config.c      |   4 +-
 t/helper/test-hashmap.c |   9 +-
 27 files changed, 621 insertions(+), 170 deletions(-)
 create mode 100644 strmap.c
 create mode 100644 strmap.h


base-commit: d4a392452e292ff924e79ec8458611c0f679d6d4
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-835%2Fnewren%2Fstrmap-v6
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-835/newren/strmap-v6
Pull-Request: https://github.com/git/git/pull/835

Range-diff vs v5:

  1:  af6b6fcb46 =  1:  af6b6fcb46 hashmap: add usage documentation explaining hashmap_free[_entries]()
  2:  591161fd78 =  2:  591161fd78 hashmap: adjust spacing to fix argument alignment
  3:  f2718d036d =  3:  f2718d036d hashmap: allow re-use after hashmap_free()
  4:  61f1da3c51 =  4:  61f1da3c51 hashmap: introduce a new hashmap_partial_clear()
  5:  861e8d65ae =  5:  861e8d65ae hashmap: provide deallocation function names
  6:  448d3b219f =  6:  448d3b219f strmap: new utility functions
  7:  5e8004c728 =  7:  5e8004c728 strmap: add more utility functions
  8:  fd96e9fc8d =  8:  fd96e9fc8d strmap: enable faster clearing and reusing of strmaps
  9:  f499934f54 =  9:  f499934f54 strmap: add functions facilitating use as a string->int map
 10:  3bcceb8cdb = 10:  3bcceb8cdb strmap: split create_entry() out of strmap_put()
 11:  e128a71fec = 11:  e128a71fec strmap: add a strset sub-type
 12:  34f542d9dd ! 12:  3926c4c97b strmap: enable allocations to come from a mem_pool
     @@ strmap.h
       
       #include "hashmap.h"
       
     -+struct mempool;
     ++struct mem_pool;
       struct strmap {
       	struct hashmap map;
      +	struct mem_pool *pool;
 13:  39ec2fa411 = 13:  562595224b strmap: take advantage of FLEXPTR_ALLOC_STR when relevant
 14:  d3713d88f2 = 14:  058e7a6b76 Use new HASHMAP_INIT macro to simplify hashmap initialization
 15:  24e5ce60f5 = 15:  9b494c26c1 shortlog: use strset from strmap.h

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v6 01/15] hashmap: add usage documentation explaining hashmap_free[_entries]()
  2020-11-11 20:02         ` [PATCH v6 " Elijah Newren via GitGitGadget
@ 2020-11-11 20:02           ` Elijah Newren via GitGitGadget
  2020-11-11 20:02           ` [PATCH v6 02/15] hashmap: adjust spacing to fix argument alignment Elijah Newren via GitGitGadget
                             ` (14 subsequent siblings)
  15 siblings, 0 replies; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-11 20:02 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Phillip Wood, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

The existence of hashmap_free() and hashmap_free_entries() confused me,
and the docs weren't clear enough.  We are dealing with a map table,
entries in that table, and possibly also things each of those entries
point to.  I had to consult other source code examples and the
implementation.  Add a brief note to clarify the differences.  This will
become even more important once we introduce a new
hashmap_partial_clear() function which will add the question of whether
the table itself has been freed.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 hashmap.h | 31 +++++++++++++++++++++++++++++--
 1 file changed, 29 insertions(+), 2 deletions(-)

diff --git a/hashmap.h b/hashmap.h
index b011b394fe..2994dc7a9c 100644
--- a/hashmap.h
+++ b/hashmap.h
@@ -236,13 +236,40 @@ void hashmap_init(struct hashmap *map,
 void hashmap_free_(struct hashmap *map, ssize_t offset);
 
 /*
- * Frees a hashmap structure and allocated memory, leaves entries undisturbed
+ * Frees a hashmap structure and allocated memory for the table, but does not
+ * free the entries nor anything they point to.
+ *
+ * Usage note:
+ *
+ * Many callers will need to iterate over all entries and free the data each
+ * entry points to; in such a case, they can free the entry itself while at it.
+ * Thus, you might see:
+ *
+ *    hashmap_for_each_entry(map, hashmap_iter, e, hashmap_entry_name) {
+ *      free(e->somefield);
+ *      free(e);
+ *    }
+ *    hashmap_free(map);
+ *
+ * instead of
+ *
+ *    hashmap_for_each_entry(map, hashmap_iter, e, hashmap_entry_name) {
+ *      free(e->somefield);
+ *    }
+ *    hashmap_free_entries(map, struct my_entry_struct, hashmap_entry_name);
+ *
+ * to avoid the implicit extra loop over the entries.  However, if there are
+ * no special fields in your entry that need to be freed beyond the entry
+ * itself, it is probably simpler to avoid the explicit loop and just call
+ * hashmap_free_entries().
  */
 #define hashmap_free(map) hashmap_free_(map, -1)
 
 /*
  * Frees @map and all entries.  @type is the struct type of the entry
- * where @member is the hashmap_entry struct used to associate with @map
+ * where @member is the hashmap_entry struct used to associate with @map.
+ *
+ * See usage note above hashmap_free().
  */
 #define hashmap_free_entries(map, type, member) \
 	hashmap_free_(map, offsetof(type, member));
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v6 02/15] hashmap: adjust spacing to fix argument alignment
  2020-11-11 20:02         ` [PATCH v6 " Elijah Newren via GitGitGadget
  2020-11-11 20:02           ` [PATCH v6 01/15] hashmap: add usage documentation explaining hashmap_free[_entries]() Elijah Newren via GitGitGadget
@ 2020-11-11 20:02           ` Elijah Newren via GitGitGadget
  2020-11-11 20:02           ` [PATCH v6 03/15] hashmap: allow re-use after hashmap_free() Elijah Newren via GitGitGadget
                             ` (13 subsequent siblings)
  15 siblings, 0 replies; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-11 20:02 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Phillip Wood, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

No actual code changes; just whitespace adjustments.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 hashmap.c | 17 +++++++++--------
 hashmap.h | 22 +++++++++++-----------
 2 files changed, 20 insertions(+), 19 deletions(-)

diff --git a/hashmap.c b/hashmap.c
index 09813e1a46..e44d8a3e85 100644
--- a/hashmap.c
+++ b/hashmap.c
@@ -92,8 +92,9 @@ static void alloc_table(struct hashmap *map, unsigned int size)
 }
 
 static inline int entry_equals(const struct hashmap *map,
-		const struct hashmap_entry *e1, const struct hashmap_entry *e2,
-		const void *keydata)
+			       const struct hashmap_entry *e1,
+			       const struct hashmap_entry *e2,
+			       const void *keydata)
 {
 	return (e1 == e2) ||
 	       (e1->hash == e2->hash &&
@@ -101,7 +102,7 @@ static inline int entry_equals(const struct hashmap *map,
 }
 
 static inline unsigned int bucket(const struct hashmap *map,
-		const struct hashmap_entry *key)
+				  const struct hashmap_entry *key)
 {
 	return key->hash & (map->tablesize - 1);
 }
@@ -148,7 +149,7 @@ static int always_equal(const void *unused_cmp_data,
 }
 
 void hashmap_init(struct hashmap *map, hashmap_cmp_fn equals_function,
-		const void *cmpfn_data, size_t initial_size)
+		  const void *cmpfn_data, size_t initial_size)
 {
 	unsigned int size = HASHMAP_INITIAL_SIZE;
 
@@ -199,7 +200,7 @@ struct hashmap_entry *hashmap_get(const struct hashmap *map,
 }
 
 struct hashmap_entry *hashmap_get_next(const struct hashmap *map,
-			const struct hashmap_entry *entry)
+				       const struct hashmap_entry *entry)
 {
 	struct hashmap_entry *e = entry->next;
 	for (; e; e = e->next)
@@ -225,8 +226,8 @@ void hashmap_add(struct hashmap *map, struct hashmap_entry *entry)
 }
 
 struct hashmap_entry *hashmap_remove(struct hashmap *map,
-					const struct hashmap_entry *key,
-					const void *keydata)
+				     const struct hashmap_entry *key,
+				     const void *keydata)
 {
 	struct hashmap_entry *old;
 	struct hashmap_entry **e = find_entry_ptr(map, key, keydata);
@@ -249,7 +250,7 @@ struct hashmap_entry *hashmap_remove(struct hashmap *map,
 }
 
 struct hashmap_entry *hashmap_put(struct hashmap *map,
-				struct hashmap_entry *entry)
+				  struct hashmap_entry *entry)
 {
 	struct hashmap_entry *old = hashmap_remove(map, entry, NULL);
 	hashmap_add(map, entry);
diff --git a/hashmap.h b/hashmap.h
index 2994dc7a9c..904f61d6e1 100644
--- a/hashmap.h
+++ b/hashmap.h
@@ -228,9 +228,9 @@ struct hashmap {
  * prevent expensive resizing. If 0, the table is dynamically resized.
  */
 void hashmap_init(struct hashmap *map,
-			 hashmap_cmp_fn equals_function,
-			 const void *equals_function_data,
-			 size_t initial_size);
+		  hashmap_cmp_fn equals_function,
+		  const void *equals_function_data,
+		  size_t initial_size);
 
 /* internal function for freeing hashmap */
 void hashmap_free_(struct hashmap *map, ssize_t offset);
@@ -288,7 +288,7 @@ void hashmap_free_(struct hashmap *map, ssize_t offset);
  * and if it is on stack, you can just let it go out of scope).
  */
 static inline void hashmap_entry_init(struct hashmap_entry *e,
-					unsigned int hash)
+				      unsigned int hash)
 {
 	e->hash = hash;
 	e->next = NULL;
@@ -330,8 +330,8 @@ static inline unsigned int hashmap_get_size(struct hashmap *map)
  * to `hashmap_cmp_fn` to decide whether the entry matches the key.
  */
 struct hashmap_entry *hashmap_get(const struct hashmap *map,
-				const struct hashmap_entry *key,
-				const void *keydata);
+				  const struct hashmap_entry *key,
+				  const void *keydata);
 
 /*
  * Returns the hashmap entry for the specified hash code and key data,
@@ -364,7 +364,7 @@ static inline struct hashmap_entry *hashmap_get_from_hash(
  * call to `hashmap_get` or `hashmap_get_next`.
  */
 struct hashmap_entry *hashmap_get_next(const struct hashmap *map,
-			const struct hashmap_entry *entry);
+				       const struct hashmap_entry *entry);
 
 /*
  * Adds a hashmap entry. This allows to add duplicate entries (i.e.
@@ -384,7 +384,7 @@ void hashmap_add(struct hashmap *map, struct hashmap_entry *entry);
  * Returns the replaced entry, or NULL if not found (i.e. the entry was added).
  */
 struct hashmap_entry *hashmap_put(struct hashmap *map,
-				struct hashmap_entry *entry);
+				  struct hashmap_entry *entry);
 
 /*
  * Adds or replaces a hashmap entry contained within @keyvar,
@@ -406,8 +406,8 @@ struct hashmap_entry *hashmap_put(struct hashmap *map,
  * Argument explanation is the same as in `hashmap_get`.
  */
 struct hashmap_entry *hashmap_remove(struct hashmap *map,
-					const struct hashmap_entry *key,
-					const void *keydata);
+				     const struct hashmap_entry *key,
+				     const void *keydata);
 
 /*
  * Removes a hashmap entry contained within @keyvar,
@@ -449,7 +449,7 @@ struct hashmap_entry *hashmap_iter_next(struct hashmap_iter *iter);
 
 /* Initializes the iterator and returns the first entry, if any. */
 static inline struct hashmap_entry *hashmap_iter_first(struct hashmap *map,
-		struct hashmap_iter *iter)
+						       struct hashmap_iter *iter)
 {
 	hashmap_iter_init(map, iter);
 	return hashmap_iter_next(iter);
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v6 03/15] hashmap: allow re-use after hashmap_free()
  2020-11-11 20:02         ` [PATCH v6 " Elijah Newren via GitGitGadget
  2020-11-11 20:02           ` [PATCH v6 01/15] hashmap: add usage documentation explaining hashmap_free[_entries]() Elijah Newren via GitGitGadget
  2020-11-11 20:02           ` [PATCH v6 02/15] hashmap: adjust spacing to fix argument alignment Elijah Newren via GitGitGadget
@ 2020-11-11 20:02           ` Elijah Newren via GitGitGadget
  2020-11-11 20:02           ` [PATCH v6 04/15] hashmap: introduce a new hashmap_partial_clear() Elijah Newren via GitGitGadget
                             ` (12 subsequent siblings)
  15 siblings, 0 replies; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-11 20:02 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Phillip Wood, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Previously, once map->table had been freed, any calls to hashmap_put(),
hashmap_get(), or hashmap_remove() would cause a NULL pointer
dereference (since hashmap_free_() also zeros the memory; without that
zeroing, calling these functions would cause a use-after-free problem).

Modify these functions to check for a NULL table and automatically
allocate as needed.

Also add a HASHMAP_INIT(fn, data) macro for initializing hashmaps on the
stack without calling hashmap_init().

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 hashmap.c | 16 ++++++++++++++--
 hashmap.h |  3 +++
 2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/hashmap.c b/hashmap.c
index e44d8a3e85..bb7c9979b8 100644
--- a/hashmap.c
+++ b/hashmap.c
@@ -114,6 +114,7 @@ int hashmap_bucket(const struct hashmap *map, unsigned int hash)
 
 static void rehash(struct hashmap *map, unsigned int newsize)
 {
+	/* map->table MUST NOT be NULL when this function is called */
 	unsigned int i, oldsize = map->tablesize;
 	struct hashmap_entry **oldtable = map->table;
 
@@ -134,6 +135,7 @@ static void rehash(struct hashmap *map, unsigned int newsize)
 static inline struct hashmap_entry **find_entry_ptr(const struct hashmap *map,
 		const struct hashmap_entry *key, const void *keydata)
 {
+	/* map->table MUST NOT be NULL when this function is called */
 	struct hashmap_entry **e = &map->table[bucket(map, key)];
 	while (*e && !entry_equals(map, *e, key, keydata))
 		e = &(*e)->next;
@@ -196,6 +198,8 @@ struct hashmap_entry *hashmap_get(const struct hashmap *map,
 				const struct hashmap_entry *key,
 				const void *keydata)
 {
+	if (!map->table)
+		return NULL;
 	return *find_entry_ptr(map, key, keydata);
 }
 
@@ -211,8 +215,12 @@ struct hashmap_entry *hashmap_get_next(const struct hashmap *map,
 
 void hashmap_add(struct hashmap *map, struct hashmap_entry *entry)
 {
-	unsigned int b = bucket(map, entry);
+	unsigned int b;
+
+	if (!map->table)
+		alloc_table(map, HASHMAP_INITIAL_SIZE);
 
+	b = bucket(map, entry);
 	/* add entry */
 	entry->next = map->table[b];
 	map->table[b] = entry;
@@ -230,7 +238,11 @@ struct hashmap_entry *hashmap_remove(struct hashmap *map,
 				     const void *keydata)
 {
 	struct hashmap_entry *old;
-	struct hashmap_entry **e = find_entry_ptr(map, key, keydata);
+	struct hashmap_entry **e;
+
+	if (!map->table)
+		return NULL;
+	e = find_entry_ptr(map, key, keydata);
 	if (!*e)
 		return NULL;
 
diff --git a/hashmap.h b/hashmap.h
index 904f61d6e1..3b0f2bcade 100644
--- a/hashmap.h
+++ b/hashmap.h
@@ -210,6 +210,9 @@ struct hashmap {
 
 /* hashmap functions */
 
+#define HASHMAP_INIT(fn, data) { .cmpfn = fn, .cmpfn_data = data, \
+				 .do_count_items = 1 }
+
 /*
  * Initializes a hashmap structure.
  *
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v6 04/15] hashmap: introduce a new hashmap_partial_clear()
  2020-11-11 20:02         ` [PATCH v6 " Elijah Newren via GitGitGadget
                             ` (2 preceding siblings ...)
  2020-11-11 20:02           ` [PATCH v6 03/15] hashmap: allow re-use after hashmap_free() Elijah Newren via GitGitGadget
@ 2020-11-11 20:02           ` Elijah Newren via GitGitGadget
  2020-11-11 20:02           ` [PATCH v6 05/15] hashmap: provide deallocation function names Elijah Newren via GitGitGadget
                             ` (11 subsequent siblings)
  15 siblings, 0 replies; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-11 20:02 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Phillip Wood, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

merge-ort is a heavy user of strmaps, which are built on hashmap.[ch].
clear_or_reinit_internal_opts() in merge-ort was taking about 12% of
overall runtime in my testcase involving rebasing 35 patches of
linux.git across a big rename.  clear_or_reinit_internal_opts() was
calling hashmap_free() followed by hashmap_init(), meaning that not only
was it freeing all the memory associated with each of the strmaps just
to immediately allocate a new array again, it was allocating a new array
that was likely smaller than needed (thus resulting in later need to
rehash things).  The ending size of the map table on the previous commit
was likely almost perfectly sized for the next commit we wanted to pick,
and not dropping and reallocating the table immediately is a win.

Add some new API to hashmap to clear a hashmap of entries without
freeing map->table (and instead only zeroing it out like alloc_table()
would do, along with zeroing the count of items in the table and the
shrink_at field).

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 hashmap.c | 39 +++++++++++++++++++++++++++------------
 hashmap.h | 13 ++++++++++++-
 2 files changed, 39 insertions(+), 13 deletions(-)

diff --git a/hashmap.c b/hashmap.c
index bb7c9979b8..922ed07954 100644
--- a/hashmap.c
+++ b/hashmap.c
@@ -174,22 +174,37 @@ void hashmap_init(struct hashmap *map, hashmap_cmp_fn equals_function,
 	map->do_count_items = 1;
 }
 
+static void free_individual_entries(struct hashmap *map, ssize_t entry_offset)
+{
+	struct hashmap_iter iter;
+	struct hashmap_entry *e;
+
+	hashmap_iter_init(map, &iter);
+	while ((e = hashmap_iter_next(&iter)))
+		/*
+		 * like container_of, but using caller-calculated
+		 * offset (caller being hashmap_free_entries)
+		 */
+		free((char *)e - entry_offset);
+}
+
+void hashmap_partial_clear_(struct hashmap *map, ssize_t entry_offset)
+{
+	if (!map || !map->table)
+		return;
+	if (entry_offset >= 0)  /* called by hashmap_clear_entries */
+		free_individual_entries(map, entry_offset);
+	memset(map->table, 0, map->tablesize * sizeof(struct hashmap_entry *));
+	map->shrink_at = 0;
+	map->private_size = 0;
+}
+
 void hashmap_free_(struct hashmap *map, ssize_t entry_offset)
 {
 	if (!map || !map->table)
 		return;
-	if (entry_offset >= 0) { /* called by hashmap_free_entries */
-		struct hashmap_iter iter;
-		struct hashmap_entry *e;
-
-		hashmap_iter_init(map, &iter);
-		while ((e = hashmap_iter_next(&iter)))
-			/*
-			 * like container_of, but using caller-calculated
-			 * offset (caller being hashmap_free_entries)
-			 */
-			free((char *)e - entry_offset);
-	}
+	if (entry_offset >= 0)  /* called by hashmap_free_entries */
+		free_individual_entries(map, entry_offset);
 	free(map->table);
 	memset(map, 0, sizeof(*map));
 }
diff --git a/hashmap.h b/hashmap.h
index 3b0f2bcade..e9430d582a 100644
--- a/hashmap.h
+++ b/hashmap.h
@@ -235,7 +235,8 @@ void hashmap_init(struct hashmap *map,
 		  const void *equals_function_data,
 		  size_t initial_size);
 
-/* internal function for freeing hashmap */
+/* internal functions for clearing or freeing hashmap */
+void hashmap_partial_clear_(struct hashmap *map, ssize_t offset);
 void hashmap_free_(struct hashmap *map, ssize_t offset);
 
 /*
@@ -268,6 +269,16 @@ void hashmap_free_(struct hashmap *map, ssize_t offset);
  */
 #define hashmap_free(map) hashmap_free_(map, -1)
 
+/*
+ * Basically the same as calling hashmap_free() followed by hashmap_init(),
+ * but doesn't incur the overhead of deallocating and reallocating
+ * map->table; it leaves map->table allocated and the same size but zeroes
+ * it out so it's ready for use again as an empty map.  As with
+ * hashmap_free(), you may need to free the entries yourself before calling
+ * this function.
+ */
+#define hashmap_partial_clear(map) hashmap_partial_clear_(map, -1)
+
 /*
  * Frees @map and all entries.  @type is the struct type of the entry
  * where @member is the hashmap_entry struct used to associate with @map.
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v6 05/15] hashmap: provide deallocation function names
  2020-11-11 20:02         ` [PATCH v6 " Elijah Newren via GitGitGadget
                             ` (3 preceding siblings ...)
  2020-11-11 20:02           ` [PATCH v6 04/15] hashmap: introduce a new hashmap_partial_clear() Elijah Newren via GitGitGadget
@ 2020-11-11 20:02           ` Elijah Newren via GitGitGadget
  2020-11-11 20:02           ` [PATCH v6 06/15] strmap: new utility functions Elijah Newren via GitGitGadget
                             ` (10 subsequent siblings)
  15 siblings, 0 replies; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-11 20:02 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Phillip Wood, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

hashmap_free(), hashmap_free_entries(), and hashmap_free_() have existed
for a while, but aren't necessarily the clearest names, especially with
hashmap_partial_clear() being added to the mix and lazy-initialization
now being supported.  Peff suggested we adopt the following names[1]:

  - hashmap_clear() - remove all entries and de-allocate any
    hashmap-specific data, but be ready for reuse

  - hashmap_clear_and_free() - ditto, but free the entries themselves

  - hashmap_partial_clear() - remove all entries but don't deallocate
    table

  - hashmap_partial_clear_and_free() - ditto, but free the entries

This patch provides the new names and converts all existing callers over
to the new naming scheme.

[1] https://lore.kernel.org/git/20201030125059.GA3277724@coredump.intra.peff.net/

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 add-interactive.c       |  2 +-
 blame.c                 |  2 +-
 bloom.c                 |  2 +-
 builtin/fetch.c         |  6 +++---
 builtin/shortlog.c      |  2 +-
 config.c                |  2 +-
 diff.c                  |  4 ++--
 diffcore-rename.c       |  2 +-
 dir.c                   |  8 ++++----
 hashmap.c               |  6 +++---
 hashmap.h               | 44 +++++++++++++++++++++++++----------------
 merge-recursive.c       |  6 +++---
 name-hash.c             |  4 ++--
 object.c                |  2 +-
 oidmap.c                |  2 +-
 patch-ids.c             |  2 +-
 range-diff.c            |  2 +-
 ref-filter.c            |  2 +-
 revision.c              |  2 +-
 sequencer.c             |  4 ++--
 submodule-config.c      |  4 ++--
 t/helper/test-hashmap.c |  6 +++---
 22 files changed, 63 insertions(+), 53 deletions(-)

diff --git a/add-interactive.c b/add-interactive.c
index 555c4abf32..a14c0feaa2 100644
--- a/add-interactive.c
+++ b/add-interactive.c
@@ -557,7 +557,7 @@ static int get_modified_files(struct repository *r,
 		if (ps)
 			clear_pathspec(&rev.prune_data);
 	}
-	hashmap_free_entries(&s.file_map, struct pathname_entry, ent);
+	hashmap_clear_and_free(&s.file_map, struct pathname_entry, ent);
 	if (unmerged_count)
 		*unmerged_count = s.unmerged_count;
 	if (binary_count)
diff --git a/blame.c b/blame.c
index 686845b2b4..229beb6452 100644
--- a/blame.c
+++ b/blame.c
@@ -435,7 +435,7 @@ static void get_fingerprint(struct fingerprint *result,
 
 static void free_fingerprint(struct fingerprint *f)
 {
-	hashmap_free(&f->map);
+	hashmap_clear(&f->map);
 	free(f->entries);
 }
 
diff --git a/bloom.c b/bloom.c
index 68c73200a5..719c313a1c 100644
--- a/bloom.c
+++ b/bloom.c
@@ -287,7 +287,7 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r,
 		}
 
 	cleanup:
-		hashmap_free_entries(&pathmap, struct pathmap_hash_entry, entry);
+		hashmap_clear_and_free(&pathmap, struct pathmap_hash_entry, entry);
 	} else {
 		for (i = 0; i < diff_queued_diff.nr; i++)
 			diff_free_filepair(diff_queued_diff.queue[i]);
diff --git a/builtin/fetch.c b/builtin/fetch.c
index f9c3c49f14..ecf8537605 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -393,7 +393,7 @@ static void find_non_local_tags(const struct ref *refs,
 		item = refname_hash_add(&remote_refs, ref->name, &ref->old_oid);
 		string_list_insert(&remote_refs_list, ref->name);
 	}
-	hashmap_free_entries(&existing_refs, struct refname_hash_entry, ent);
+	hashmap_clear_and_free(&existing_refs, struct refname_hash_entry, ent);
 
 	/*
 	 * We may have a final lightweight tag that needs to be
@@ -428,7 +428,7 @@ static void find_non_local_tags(const struct ref *refs,
 		**tail = rm;
 		*tail = &rm->next;
 	}
-	hashmap_free_entries(&remote_refs, struct refname_hash_entry, ent);
+	hashmap_clear_and_free(&remote_refs, struct refname_hash_entry, ent);
 	string_list_clear(&remote_refs_list, 0);
 	oidset_clear(&fetch_oids);
 }
@@ -573,7 +573,7 @@ static struct ref *get_ref_map(struct remote *remote,
 		}
 	}
 	if (existing_refs_populated)
-		hashmap_free_entries(&existing_refs, struct refname_hash_entry, ent);
+		hashmap_clear_and_free(&existing_refs, struct refname_hash_entry, ent);
 
 	return ref_map;
 }
diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index 0a5c4968f6..83f0a739b4 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -220,7 +220,7 @@ static void strset_clear(struct strset *ss)
 {
 	if (!ss->map.table)
 		return;
-	hashmap_free_entries(&ss->map, struct strset_item, ent);
+	hashmap_clear_and_free(&ss->map, struct strset_item, ent);
 }
 
 static void insert_records_from_trailers(struct shortlog *log,
diff --git a/config.c b/config.c
index 2bdff4457b..8f324ed3a6 100644
--- a/config.c
+++ b/config.c
@@ -1963,7 +1963,7 @@ void git_configset_clear(struct config_set *cs)
 		free(entry->key);
 		string_list_clear(&entry->value_list, 1);
 	}
-	hashmap_free_entries(&cs->config_hash, struct config_set_element, ent);
+	hashmap_clear_and_free(&cs->config_hash, struct config_set_element, ent);
 	cs->hash_initialized = 0;
 	free(cs->list.items);
 	cs->list.nr = 0;
diff --git a/diff.c b/diff.c
index 2bb2f8f57e..8e0e59f5cf 100644
--- a/diff.c
+++ b/diff.c
@@ -6289,9 +6289,9 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
 			if (o->color_moved == COLOR_MOVED_ZEBRA_DIM)
 				dim_moved_lines(o);
 
-			hashmap_free_entries(&add_lines, struct moved_entry,
+			hashmap_clear_and_free(&add_lines, struct moved_entry,
 						ent);
-			hashmap_free_entries(&del_lines, struct moved_entry,
+			hashmap_clear_and_free(&del_lines, struct moved_entry,
 						ent);
 		}
 
diff --git a/diffcore-rename.c b/diffcore-rename.c
index 99e63e90f8..d367a6d244 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -407,7 +407,7 @@ static int find_exact_renames(struct diff_options *options)
 		renames += find_identical_files(&file_table, i, options);
 
 	/* Free the hash data structure and entries */
-	hashmap_free_entries(&file_table, struct file_similarity, entry);
+	hashmap_clear_and_free(&file_table, struct file_similarity, entry);
 
 	return renames;
 }
diff --git a/dir.c b/dir.c
index 78387110e6..161dce121e 100644
--- a/dir.c
+++ b/dir.c
@@ -817,8 +817,8 @@ static void add_pattern_to_hashsets(struct pattern_list *pl, struct path_pattern
 
 clear_hashmaps:
 	warning(_("disabling cone pattern matching"));
-	hashmap_free_entries(&pl->parent_hashmap, struct pattern_entry, ent);
-	hashmap_free_entries(&pl->recursive_hashmap, struct pattern_entry, ent);
+	hashmap_clear_and_free(&pl->parent_hashmap, struct pattern_entry, ent);
+	hashmap_clear_and_free(&pl->recursive_hashmap, struct pattern_entry, ent);
 	pl->use_cone_patterns = 0;
 }
 
@@ -921,8 +921,8 @@ void clear_pattern_list(struct pattern_list *pl)
 		free(pl->patterns[i]);
 	free(pl->patterns);
 	free(pl->filebuf);
-	hashmap_free_entries(&pl->recursive_hashmap, struct pattern_entry, ent);
-	hashmap_free_entries(&pl->parent_hashmap, struct pattern_entry, ent);
+	hashmap_clear_and_free(&pl->recursive_hashmap, struct pattern_entry, ent);
+	hashmap_clear_and_free(&pl->parent_hashmap, struct pattern_entry, ent);
 
 	memset(pl, 0, sizeof(*pl));
 }
diff --git a/hashmap.c b/hashmap.c
index 922ed07954..5009471800 100644
--- a/hashmap.c
+++ b/hashmap.c
@@ -183,7 +183,7 @@ static void free_individual_entries(struct hashmap *map, ssize_t entry_offset)
 	while ((e = hashmap_iter_next(&iter)))
 		/*
 		 * like container_of, but using caller-calculated
-		 * offset (caller being hashmap_free_entries)
+		 * offset (caller being hashmap_clear_and_free)
 		 */
 		free((char *)e - entry_offset);
 }
@@ -199,11 +199,11 @@ void hashmap_partial_clear_(struct hashmap *map, ssize_t entry_offset)
 	map->private_size = 0;
 }
 
-void hashmap_free_(struct hashmap *map, ssize_t entry_offset)
+void hashmap_clear_(struct hashmap *map, ssize_t entry_offset)
 {
 	if (!map || !map->table)
 		return;
-	if (entry_offset >= 0)  /* called by hashmap_free_entries */
+	if (entry_offset >= 0)  /* called by hashmap_clear_and_free */
 		free_individual_entries(map, entry_offset);
 	free(map->table);
 	memset(map, 0, sizeof(*map));
diff --git a/hashmap.h b/hashmap.h
index e9430d582a..7251687d73 100644
--- a/hashmap.h
+++ b/hashmap.h
@@ -96,7 +96,7 @@
  *         }
  *
  *         if (!strcmp("end", action)) {
- *             hashmap_free_entries(&map, struct long2string, ent);
+ *             hashmap_clear_and_free(&map, struct long2string, ent);
  *             break;
  *         }
  *     }
@@ -237,7 +237,7 @@ void hashmap_init(struct hashmap *map,
 
 /* internal functions for clearing or freeing hashmap */
 void hashmap_partial_clear_(struct hashmap *map, ssize_t offset);
-void hashmap_free_(struct hashmap *map, ssize_t offset);
+void hashmap_clear_(struct hashmap *map, ssize_t offset);
 
 /*
  * Frees a hashmap structure and allocated memory for the table, but does not
@@ -253,40 +253,50 @@ void hashmap_free_(struct hashmap *map, ssize_t offset);
  *      free(e->somefield);
  *      free(e);
  *    }
- *    hashmap_free(map);
+ *    hashmap_clear(map);
  *
  * instead of
  *
  *    hashmap_for_each_entry(map, hashmap_iter, e, hashmap_entry_name) {
  *      free(e->somefield);
  *    }
- *    hashmap_free_entries(map, struct my_entry_struct, hashmap_entry_name);
+ *    hashmap_clear_and_free(map, struct my_entry_struct, hashmap_entry_name);
  *
  * to avoid the implicit extra loop over the entries.  However, if there are
  * no special fields in your entry that need to be freed beyond the entry
  * itself, it is probably simpler to avoid the explicit loop and just call
- * hashmap_free_entries().
+ * hashmap_clear_and_free().
  */
-#define hashmap_free(map) hashmap_free_(map, -1)
+#define hashmap_clear(map) hashmap_clear_(map, -1)
 
 /*
- * Basically the same as calling hashmap_free() followed by hashmap_init(),
- * but doesn't incur the overhead of deallocating and reallocating
- * map->table; it leaves map->table allocated and the same size but zeroes
- * it out so it's ready for use again as an empty map.  As with
- * hashmap_free(), you may need to free the entries yourself before calling
- * this function.
+ * Similar to hashmap_clear(), except that the table is no deallocated; it
+ * is merely zeroed out but left the same size as before.  If the hashmap
+ * will be reused, this avoids the overhead of deallocating and
+ * reallocating map->table.  As with hashmap_clear(), you may need to free
+ * the entries yourself before calling this function.
  */
 #define hashmap_partial_clear(map) hashmap_partial_clear_(map, -1)
 
 /*
- * Frees @map and all entries.  @type is the struct type of the entry
- * where @member is the hashmap_entry struct used to associate with @map.
+ * Similar to hashmap_clear() but also frees all entries.  @type is the
+ * struct type of the entry where @member is the hashmap_entry struct used
+ * to associate with @map.
  *
- * See usage note above hashmap_free().
+ * See usage note above hashmap_clear().
  */
-#define hashmap_free_entries(map, type, member) \
-	hashmap_free_(map, offsetof(type, member));
+#define hashmap_clear_and_free(map, type, member) \
+	hashmap_clear_(map, offsetof(type, member))
+
+/*
+ * Similar to hashmap_partial_clear() but also frees all entries.  @type is
+ * the struct type of the entry where @member is the hashmap_entry struct
+ * used to associate with @map.
+ *
+ * See usage note above hashmap_clear().
+ */
+#define hashmap_partial_clear_and_free(map, type, member) \
+	hashmap_partial_clear_(map, offsetof(type, member))
 
 /* hashmap_entry functions */
 
diff --git a/merge-recursive.c b/merge-recursive.c
index d0214335a7..f736a0f632 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -2651,7 +2651,7 @@ static struct string_list *get_renames(struct merge_options *opt,
 		free(e->target_file);
 		string_list_clear(&e->source_files, 0);
 	}
-	hashmap_free_entries(&collisions, struct collision_entry, ent);
+	hashmap_clear_and_free(&collisions, struct collision_entry, ent);
 	return renames;
 }
 
@@ -2870,7 +2870,7 @@ static void initial_cleanup_rename(struct diff_queue_struct *pairs,
 		strbuf_release(&e->new_dir);
 		/* possible_new_dirs already cleared in get_directory_renames */
 	}
-	hashmap_free_entries(dir_renames, struct dir_rename_entry, ent);
+	hashmap_clear_and_free(dir_renames, struct dir_rename_entry, ent);
 	free(dir_renames);
 
 	free(pairs->queue);
@@ -3497,7 +3497,7 @@ static int merge_trees_internal(struct merge_options *opt,
 		string_list_clear(entries, 1);
 		free(entries);
 
-		hashmap_free_entries(&opt->priv->current_file_dir_set,
+		hashmap_clear_and_free(&opt->priv->current_file_dir_set,
 					struct path_hashmap_entry, e);
 
 		if (clean < 0) {
diff --git a/name-hash.c b/name-hash.c
index fb526a3775..5d3c7b12c1 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -726,6 +726,6 @@ void free_name_hash(struct index_state *istate)
 		return;
 	istate->name_hash_initialized = 0;
 
-	hashmap_free(&istate->name_hash);
-	hashmap_free_entries(&istate->dir_hash, struct dir_entry, ent);
+	hashmap_clear(&istate->name_hash);
+	hashmap_clear_and_free(&istate->dir_hash, struct dir_entry, ent);
 }
diff --git a/object.c b/object.c
index 3257518656..b8406409d5 100644
--- a/object.c
+++ b/object.c
@@ -532,7 +532,7 @@ void raw_object_store_clear(struct raw_object_store *o)
 	close_object_store(o);
 	o->packed_git = NULL;
 
-	hashmap_free(&o->pack_map);
+	hashmap_clear(&o->pack_map);
 }
 
 void parsed_object_pool_clear(struct parsed_object_pool *o)
diff --git a/oidmap.c b/oidmap.c
index 423aa014a3..286a04a53c 100644
--- a/oidmap.c
+++ b/oidmap.c
@@ -27,7 +27,7 @@ void oidmap_free(struct oidmap *map, int free_entries)
 		return;
 
 	/* TODO: make oidmap itself not depend on struct layouts */
-	hashmap_free_(&map->map, free_entries ? 0 : -1);
+	hashmap_clear_(&map->map, free_entries ? 0 : -1);
 }
 
 void *oidmap_get(const struct oidmap *map, const struct object_id *key)
diff --git a/patch-ids.c b/patch-ids.c
index 12aa6d494b..21973e4933 100644
--- a/patch-ids.c
+++ b/patch-ids.c
@@ -71,7 +71,7 @@ int init_patch_ids(struct repository *r, struct patch_ids *ids)
 
 int free_patch_ids(struct patch_ids *ids)
 {
-	hashmap_free_entries(&ids->patches, struct patch_id, ent);
+	hashmap_clear_and_free(&ids->patches, struct patch_id, ent);
 	return 0;
 }
 
diff --git a/range-diff.c b/range-diff.c
index 24dc435e48..befeecae44 100644
--- a/range-diff.c
+++ b/range-diff.c
@@ -266,7 +266,7 @@ static void find_exact_matches(struct string_list *a, struct string_list *b)
 		}
 	}
 
-	hashmap_free(&map);
+	hashmap_clear(&map);
 }
 
 static void diffsize_consume(void *data, char *line, unsigned long len)
diff --git a/ref-filter.c b/ref-filter.c
index c62f6b4822..5e66b8cd76 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2222,7 +2222,7 @@ void ref_array_clear(struct ref_array *array)
 	used_atom_cnt = 0;
 
 	if (ref_to_worktree_map.worktrees) {
-		hashmap_free_entries(&(ref_to_worktree_map.map),
+		hashmap_clear_and_free(&(ref_to_worktree_map.map),
 					struct ref_to_worktree_entry, ent);
 		free_worktrees(ref_to_worktree_map.worktrees);
 		ref_to_worktree_map.worktrees = NULL;
diff --git a/revision.c b/revision.c
index aa62212040..f27649d45d 100644
--- a/revision.c
+++ b/revision.c
@@ -139,7 +139,7 @@ static void paths_and_oids_clear(struct hashmap *map)
 		free(entry->path);
 	}
 
-	hashmap_free_entries(map, struct path_and_oids_entry, ent);
+	hashmap_clear_and_free(map, struct path_and_oids_entry, ent);
 }
 
 static void paths_and_oids_insert(struct hashmap *map,
diff --git a/sequencer.c b/sequencer.c
index 00acb12496..23a09c3e7a 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -5058,7 +5058,7 @@ static int make_script_with_merges(struct pretty_print_context *pp,
 
 	oidmap_free(&commit2todo, 1);
 	oidmap_free(&state.commit2label, 1);
-	hashmap_free_entries(&state.labels, struct labels_entry, entry);
+	hashmap_clear_and_free(&state.labels, struct labels_entry, entry);
 	strbuf_release(&state.buf);
 
 	return 0;
@@ -5577,7 +5577,7 @@ int todo_list_rearrange_squash(struct todo_list *todo_list)
 	for (i = 0; i < todo_list->nr; i++)
 		free(subjects[i]);
 	free(subjects);
-	hashmap_free_entries(&subject2item, struct subject2item_entry, entry);
+	hashmap_clear_and_free(&subject2item, struct subject2item_entry, entry);
 
 	clear_commit_todo_item(&commit_todo);
 
diff --git a/submodule-config.c b/submodule-config.c
index c569e22aa3..f502505566 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -103,8 +103,8 @@ static void submodule_cache_clear(struct submodule_cache *cache)
 				ent /* member name */)
 		free_one_config(entry);
 
-	hashmap_free_entries(&cache->for_path, struct submodule_entry, ent);
-	hashmap_free_entries(&cache->for_name, struct submodule_entry, ent);
+	hashmap_clear_and_free(&cache->for_path, struct submodule_entry, ent);
+	hashmap_clear_and_free(&cache->for_name, struct submodule_entry, ent);
 	cache->initialized = 0;
 	cache->gitmodules_read = 0;
 }
diff --git a/t/helper/test-hashmap.c b/t/helper/test-hashmap.c
index f38706216f..2475663b49 100644
--- a/t/helper/test-hashmap.c
+++ b/t/helper/test-hashmap.c
@@ -110,7 +110,7 @@ static void perf_hashmap(unsigned int method, unsigned int rounds)
 				hashmap_add(&map, &entries[i]->ent);
 			}
 
-			hashmap_free(&map);
+			hashmap_clear(&map);
 		}
 	} else {
 		/* test map lookups */
@@ -130,7 +130,7 @@ static void perf_hashmap(unsigned int method, unsigned int rounds)
 			}
 		}
 
-		hashmap_free(&map);
+		hashmap_clear(&map);
 	}
 }
 
@@ -262,6 +262,6 @@ int cmd__hashmap(int argc, const char **argv)
 	}
 
 	strbuf_release(&line);
-	hashmap_free_entries(&map, struct test_entry, ent);
+	hashmap_clear_and_free(&map, struct test_entry, ent);
 	return 0;
 }
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v6 06/15] strmap: new utility functions
  2020-11-11 20:02         ` [PATCH v6 " Elijah Newren via GitGitGadget
                             ` (4 preceding siblings ...)
  2020-11-11 20:02           ` [PATCH v6 05/15] hashmap: provide deallocation function names Elijah Newren via GitGitGadget
@ 2020-11-11 20:02           ` Elijah Newren via GitGitGadget
  2020-11-11 20:02           ` [PATCH v6 07/15] strmap: add more " Elijah Newren via GitGitGadget
                             ` (9 subsequent siblings)
  15 siblings, 0 replies; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-11 20:02 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Phillip Wood, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Add strmap as a new struct and associated utility functions,
specifically for hashmaps that map strings to some value.  The API is
taken directly from Peff's proposal at
https://lore.kernel.org/git/20180906191203.GA26184@sigill.intra.peff.net/

Note that similar string-list, I have a strdup_strings setting.
However, unlike string-list, strmap_init() does not take a parameter for
this setting and instead automatically sets it to 1; callers who want to
control this detail need to instead call strmap_init_with_options().
(Future patches will add additional parameters to
strmap_init_with_options()).

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 Makefile |  1 +
 strmap.c | 99 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 strmap.h | 65 +++++++++++++++++++++++++++++++++++++
 3 files changed, 165 insertions(+)
 create mode 100644 strmap.c
 create mode 100644 strmap.h

diff --git a/Makefile b/Makefile
index 95571ee3fc..777a34c01c 100644
--- a/Makefile
+++ b/Makefile
@@ -1000,6 +1000,7 @@ LIB_OBJS += stable-qsort.o
 LIB_OBJS += strbuf.o
 LIB_OBJS += streaming.o
 LIB_OBJS += string-list.o
+LIB_OBJS += strmap.o
 LIB_OBJS += strvec.o
 LIB_OBJS += sub-process.o
 LIB_OBJS += submodule-config.o
diff --git a/strmap.c b/strmap.c
new file mode 100644
index 0000000000..53f284eb20
--- /dev/null
+++ b/strmap.c
@@ -0,0 +1,99 @@
+#include "git-compat-util.h"
+#include "strmap.h"
+
+int cmp_strmap_entry(const void *hashmap_cmp_fn_data,
+		     const struct hashmap_entry *entry1,
+		     const struct hashmap_entry *entry2,
+		     const void *keydata)
+{
+	const struct strmap_entry *e1, *e2;
+
+	e1 = container_of(entry1, const struct strmap_entry, ent);
+	e2 = container_of(entry2, const struct strmap_entry, ent);
+	return strcmp(e1->key, e2->key);
+}
+
+static struct strmap_entry *find_strmap_entry(struct strmap *map,
+					      const char *str)
+{
+	struct strmap_entry entry;
+	hashmap_entry_init(&entry.ent, strhash(str));
+	entry.key = str;
+	return hashmap_get_entry(&map->map, &entry, ent, NULL);
+}
+
+void strmap_init(struct strmap *map)
+{
+	strmap_init_with_options(map, 1);
+}
+
+void strmap_init_with_options(struct strmap *map,
+			      int strdup_strings)
+{
+	hashmap_init(&map->map, cmp_strmap_entry, NULL, 0);
+	map->strdup_strings = strdup_strings;
+}
+
+static void strmap_free_entries_(struct strmap *map, int free_values)
+{
+	struct hashmap_iter iter;
+	struct strmap_entry *e;
+
+	if (!map)
+		return;
+
+	/*
+	 * We need to iterate over the hashmap entries and free
+	 * e->key and e->value ourselves; hashmap has no API to
+	 * take care of that for us.  Since we're already iterating over
+	 * the hashmap, though, might as well free e too and avoid the need
+	 * to make some call into the hashmap API to do that.
+	 */
+	hashmap_for_each_entry(&map->map, &iter, e, ent) {
+		if (free_values)
+			free(e->value);
+		if (map->strdup_strings)
+			free((char*)e->key);
+		free(e);
+	}
+}
+
+void strmap_clear(struct strmap *map, int free_values)
+{
+	strmap_free_entries_(map, free_values);
+	hashmap_clear(&map->map);
+}
+
+void *strmap_put(struct strmap *map, const char *str, void *data)
+{
+	struct strmap_entry *entry = find_strmap_entry(map, str);
+	void *old = NULL;
+
+	if (entry) {
+		old = entry->value;
+		entry->value = data;
+	} else {
+		const char *key = str;
+
+		entry = xmalloc(sizeof(*entry));
+		hashmap_entry_init(&entry->ent, strhash(str));
+
+		if (map->strdup_strings)
+			key = xstrdup(str);
+		entry->key = key;
+		entry->value = data;
+		hashmap_add(&map->map, &entry->ent);
+	}
+	return old;
+}
+
+void *strmap_get(struct strmap *map, const char *str)
+{
+	struct strmap_entry *entry = find_strmap_entry(map, str);
+	return entry ? entry->value : NULL;
+}
+
+int strmap_contains(struct strmap *map, const char *str)
+{
+	return find_strmap_entry(map, str) != NULL;
+}
diff --git a/strmap.h b/strmap.h
new file mode 100644
index 0000000000..96888c23ad
--- /dev/null
+++ b/strmap.h
@@ -0,0 +1,65 @@
+#ifndef STRMAP_H
+#define STRMAP_H
+
+#include "hashmap.h"
+
+struct strmap {
+	struct hashmap map;
+	unsigned int strdup_strings:1;
+};
+
+struct strmap_entry {
+	struct hashmap_entry ent;
+	const char *key;
+	void *value;
+};
+
+int cmp_strmap_entry(const void *hashmap_cmp_fn_data,
+		     const struct hashmap_entry *entry1,
+		     const struct hashmap_entry *entry2,
+		     const void *keydata);
+
+#define STRMAP_INIT { \
+			.map = HASHMAP_INIT(cmp_strmap_entry, NULL),  \
+			.strdup_strings = 1,                          \
+		    }
+
+/*
+ * Initialize the members of the strmap.  Any keys added to the strmap will
+ * be strdup'ed with their memory managed by the strmap.
+ */
+void strmap_init(struct strmap *map);
+
+/*
+ * Same as strmap_init, but for those who want to control the memory management
+ * carefully instead of using the default of strdup_strings=1.
+ */
+void strmap_init_with_options(struct strmap *map,
+			      int strdup_strings);
+
+/*
+ * Remove all entries from the map, releasing any allocated resources.
+ */
+void strmap_clear(struct strmap *map, int free_values);
+
+/*
+ * Insert "str" into the map, pointing to "data".
+ *
+ * If an entry for "str" already exists, its data pointer is overwritten, and
+ * the original data pointer returned. Otherwise, returns NULL.
+ */
+void *strmap_put(struct strmap *map, const char *str, void *data);
+
+/*
+ * Return the data pointer mapped by "str", or NULL if the entry does not
+ * exist.
+ */
+void *strmap_get(struct strmap *map, const char *str);
+
+/*
+ * Return non-zero iff "str" is present in the map. This differs from
+ * strmap_get() in that it can distinguish entries with a NULL data pointer.
+ */
+int strmap_contains(struct strmap *map, const char *str);
+
+#endif /* STRMAP_H */
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v6 07/15] strmap: add more utility functions
  2020-11-11 20:02         ` [PATCH v6 " Elijah Newren via GitGitGadget
                             ` (5 preceding siblings ...)
  2020-11-11 20:02           ` [PATCH v6 06/15] strmap: new utility functions Elijah Newren via GitGitGadget
@ 2020-11-11 20:02           ` Elijah Newren via GitGitGadget
  2020-11-11 20:02           ` [PATCH v6 08/15] strmap: enable faster clearing and reusing of strmaps Elijah Newren via GitGitGadget
                             ` (8 subsequent siblings)
  15 siblings, 0 replies; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-11 20:02 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Phillip Wood, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

This adds a number of additional convienence functions I want/need:
  * strmap_get_size()
  * strmap_empty()
  * strmap_remove()
  * strmap_for_each_entry()
  * strmap_get_entry()

I suspect the first four are self-explanatory.

strmap_get_entry() is similar to strmap_get() except that instead of just
returning the void* value that the string maps to, it returns the
strmap_entry that contains both the string and the void* value (or
NULL if the string isn't in the map).  This is helpful because it avoids
multiple lookups, e.g. in some cases a caller would need to call:
  * strmap_contains() to check that the map has an entry for the string
  * strmap_get() to get the void* value
  * <do some work to update the value>
  * strmap_put() to update/overwrite the value
If the void* pointer returned really is a pointer, then the last step is
unnecessary, but if the void* pointer is just cast to an integer then
strmap_put() will be needed.  In contrast, one can call strmap_get_entry()
and then:
  * check if the string was in the map by whether the pointer is NULL
  * access the value via entry->value
  * directly update entry->value
meaning that we can replace two or three hash table lookups with one.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 strmap.c | 20 ++++++++++++++++++++
 strmap.h | 34 ++++++++++++++++++++++++++++++++++
 2 files changed, 54 insertions(+)

diff --git a/strmap.c b/strmap.c
index 53f284eb20..829f1bc095 100644
--- a/strmap.c
+++ b/strmap.c
@@ -87,6 +87,11 @@ void *strmap_put(struct strmap *map, const char *str, void *data)
 	return old;
 }
 
+struct strmap_entry *strmap_get_entry(struct strmap *map, const char *str)
+{
+	return find_strmap_entry(map, str);
+}
+
 void *strmap_get(struct strmap *map, const char *str)
 {
 	struct strmap_entry *entry = find_strmap_entry(map, str);
@@ -97,3 +102,18 @@ int strmap_contains(struct strmap *map, const char *str)
 {
 	return find_strmap_entry(map, str) != NULL;
 }
+
+void strmap_remove(struct strmap *map, const char *str, int free_value)
+{
+	struct strmap_entry entry, *ret;
+	hashmap_entry_init(&entry.ent, strhash(str));
+	entry.key = str;
+	ret = hashmap_remove_entry(&map->map, &entry, ent, NULL);
+	if (!ret)
+		return;
+	if (free_value)
+		free(ret->value);
+	if (map->strdup_strings)
+		free((char*)ret->key);
+	free(ret);
+}
diff --git a/strmap.h b/strmap.h
index 96888c23ad..f74bc582e4 100644
--- a/strmap.h
+++ b/strmap.h
@@ -50,6 +50,12 @@ void strmap_clear(struct strmap *map, int free_values);
  */
 void *strmap_put(struct strmap *map, const char *str, void *data);
 
+/*
+ * Return the strmap_entry mapped by "str", or NULL if there is not such
+ * an item in map.
+ */
+struct strmap_entry *strmap_get_entry(struct strmap *map, const char *str);
+
 /*
  * Return the data pointer mapped by "str", or NULL if the entry does not
  * exist.
@@ -62,4 +68,32 @@ void *strmap_get(struct strmap *map, const char *str);
  */
 int strmap_contains(struct strmap *map, const char *str);
 
+/*
+ * Remove the given entry from the strmap.  If the string isn't in the
+ * strmap, the map is not altered.
+ */
+void strmap_remove(struct strmap *map, const char *str, int free_value);
+
+/*
+ * Return how many entries the strmap has.
+ */
+static inline unsigned int strmap_get_size(struct strmap *map)
+{
+	return hashmap_get_size(&map->map);
+}
+
+/*
+ * Return whether the strmap is empty.
+ */
+static inline int strmap_empty(struct strmap *map)
+{
+	return strmap_get_size(map) == 0;
+}
+
+/*
+ * iterate through @map using @iter, @var is a pointer to a type strmap_entry
+ */
+#define strmap_for_each_entry(mystrmap, iter, var)	\
+	hashmap_for_each_entry(&(mystrmap)->map, iter, var, ent)
+
 #endif /* STRMAP_H */
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v6 08/15] strmap: enable faster clearing and reusing of strmaps
  2020-11-11 20:02         ` [PATCH v6 " Elijah Newren via GitGitGadget
                             ` (6 preceding siblings ...)
  2020-11-11 20:02           ` [PATCH v6 07/15] strmap: add more " Elijah Newren via GitGitGadget
@ 2020-11-11 20:02           ` Elijah Newren via GitGitGadget
  2020-11-11 20:02           ` [PATCH v6 09/15] strmap: add functions facilitating use as a string->int map Elijah Newren via GitGitGadget
                             ` (7 subsequent siblings)
  15 siblings, 0 replies; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-11 20:02 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Phillip Wood, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

When strmaps are used heavily, such as is done by my new merge-ort
algorithm, and strmaps need to be cleared but then re-used (because of
e.g. picking multiple commits to cherry-pick, or due to a recursive
merge having several different merges while recursing), free-ing and
reallocating map->table repeatedly can add up in time, especially since
it will likely be reallocated to a much smaller size but the previous
merge provides a good guide to the right size to use for the next merge.

Introduce strmap_partial_clear() to take advantage of this type of
situation; it will act similar to strmap_clear() except that
map->table's entries are zeroed instead of map->table being free'd.
Making use of this function reduced the cost of
clear_or_reinit_internal_opts() by about 20% in mert-ort, and dropped
the overall runtime of my rebase testcase by just under 2%.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 strmap.c | 6 ++++++
 strmap.h | 6 ++++++
 2 files changed, 12 insertions(+)

diff --git a/strmap.c b/strmap.c
index 829f1bc095..c410c5241a 100644
--- a/strmap.c
+++ b/strmap.c
@@ -64,6 +64,12 @@ void strmap_clear(struct strmap *map, int free_values)
 	hashmap_clear(&map->map);
 }
 
+void strmap_partial_clear(struct strmap *map, int free_values)
+{
+	strmap_free_entries_(map, free_values);
+	hashmap_partial_clear(&map->map);
+}
+
 void *strmap_put(struct strmap *map, const char *str, void *data)
 {
 	struct strmap_entry *entry = find_strmap_entry(map, str);
diff --git a/strmap.h b/strmap.h
index f74bc582e4..c14fcee148 100644
--- a/strmap.h
+++ b/strmap.h
@@ -42,6 +42,12 @@ void strmap_init_with_options(struct strmap *map,
  */
 void strmap_clear(struct strmap *map, int free_values);
 
+/*
+ * Similar to strmap_clear() but leaves map->map->table allocated and
+ * pre-sized so that subsequent uses won't need as many rehashings.
+ */
+void strmap_partial_clear(struct strmap *map, int free_values);
+
 /*
  * Insert "str" into the map, pointing to "data".
  *
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v6 09/15] strmap: add functions facilitating use as a string->int map
  2020-11-11 20:02         ` [PATCH v6 " Elijah Newren via GitGitGadget
                             ` (7 preceding siblings ...)
  2020-11-11 20:02           ` [PATCH v6 08/15] strmap: enable faster clearing and reusing of strmaps Elijah Newren via GitGitGadget
@ 2020-11-11 20:02           ` Elijah Newren via GitGitGadget
  2020-11-11 20:02           ` [PATCH v6 10/15] strmap: split create_entry() out of strmap_put() Elijah Newren via GitGitGadget
                             ` (6 subsequent siblings)
  15 siblings, 0 replies; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-11 20:02 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Phillip Wood, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Although strmap could be used as a string->int map, one either had to
allocate an int for every entry and then deallocate later, or one had to
do a bunch of casting between (void*) and (intptr_t).

Add some special functions that do the casting.  Also, rename put->set
for such wrapper functions since 'put' implied there may be some
deallocation needed if the string was already found in the map, which
isn't the case when we're storing an int value directly in the void*
slot instead of using the void* slot as a pointer to data.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 strmap.c | 11 +++++++
 strmap.h | 94 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 105 insertions(+)

diff --git a/strmap.c b/strmap.c
index c410c5241a..0d10a884b5 100644
--- a/strmap.c
+++ b/strmap.c
@@ -123,3 +123,14 @@ void strmap_remove(struct strmap *map, const char *str, int free_value)
 		free((char*)ret->key);
 	free(ret);
 }
+
+void strintmap_incr(struct strintmap *map, const char *str, intptr_t amt)
+{
+	struct strmap_entry *entry = find_strmap_entry(&map->map, str);
+	if (entry) {
+		intptr_t *whence = (intptr_t*)&entry->value;
+		*whence += amt;
+	}
+	else
+		strintmap_set(map, str, map->default_value + amt);
+}
diff --git a/strmap.h b/strmap.h
index c14fcee148..56a5cdb864 100644
--- a/strmap.h
+++ b/strmap.h
@@ -23,6 +23,10 @@ int cmp_strmap_entry(const void *hashmap_cmp_fn_data,
 			.map = HASHMAP_INIT(cmp_strmap_entry, NULL),  \
 			.strdup_strings = 1,                          \
 		    }
+#define STRINTMAP_INIT { \
+			.map = STRMAP_INIT,   \
+			.default_value = 0,   \
+		       }
 
 /*
  * Initialize the members of the strmap.  Any keys added to the strmap will
@@ -102,4 +106,94 @@ static inline int strmap_empty(struct strmap *map)
 #define strmap_for_each_entry(mystrmap, iter, var)	\
 	hashmap_for_each_entry(&(mystrmap)->map, iter, var, ent)
 
+
+/*
+ * strintmap:
+ *    A map of string -> int, typecasting the void* of strmap to an int.
+ *
+ * Primary differences:
+ *    1) Since the void* value is just an int in disguise, there is no value
+ *       to free.  (Thus one fewer argument to strintmap_clear)
+ *    2) strintmap_get() returns an int, or returns the default_value if the
+ *       key is not found in the strintmap.
+ *    3) No strmap_put() equivalent; strintmap_set() and strintmap_incr()
+ *       instead.
+ */
+
+struct strintmap {
+	struct strmap map;
+	int default_value;
+};
+
+#define strintmap_for_each_entry(mystrmap, iter, var)	\
+	strmap_for_each_entry(&(mystrmap)->map, iter, var)
+
+static inline void strintmap_init(struct strintmap *map, int default_value)
+{
+	strmap_init(&map->map);
+	map->default_value = default_value;
+}
+
+static inline void strintmap_init_with_options(struct strintmap *map,
+					       int default_value,
+					       int strdup_strings)
+{
+	strmap_init_with_options(&map->map, strdup_strings);
+	map->default_value = default_value;
+}
+
+static inline void strintmap_clear(struct strintmap *map)
+{
+	strmap_clear(&map->map, 0);
+}
+
+static inline void strintmap_partial_clear(struct strintmap *map)
+{
+	strmap_partial_clear(&map->map, 0);
+}
+
+static inline int strintmap_contains(struct strintmap *map, const char *str)
+{
+	return strmap_contains(&map->map, str);
+}
+
+static inline void strintmap_remove(struct strintmap *map, const char *str)
+{
+	return strmap_remove(&map->map, str, 0);
+}
+
+static inline int strintmap_empty(struct strintmap *map)
+{
+	return strmap_empty(&map->map);
+}
+
+static inline unsigned int strintmap_get_size(struct strintmap *map)
+{
+	return strmap_get_size(&map->map);
+}
+
+/*
+ * Returns the value for str in the map.  If str isn't found in the map,
+ * the map's default_value is returned.
+ */
+static inline int strintmap_get(struct strintmap *map, const char *str)
+{
+	struct strmap_entry *result = strmap_get_entry(&map->map, str);
+	if (!result)
+		return map->default_value;
+	return (intptr_t)result->value;
+}
+
+static inline void strintmap_set(struct strintmap *map, const char *str,
+				 intptr_t v)
+{
+	strmap_put(&map->map, str, (void *)v);
+}
+
+/*
+ * Increment the value for str by amt.  If str isn't in the map, add it and
+ * set its value to default_value + amt.
+ */
+void strintmap_incr(struct strintmap *map, const char *str, intptr_t amt);
+
 #endif /* STRMAP_H */
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v6 10/15] strmap: split create_entry() out of strmap_put()
  2020-11-11 20:02         ` [PATCH v6 " Elijah Newren via GitGitGadget
                             ` (8 preceding siblings ...)
  2020-11-11 20:02           ` [PATCH v6 09/15] strmap: add functions facilitating use as a string->int map Elijah Newren via GitGitGadget
@ 2020-11-11 20:02           ` Elijah Newren via GitGitGadget
  2020-11-11 20:02           ` [PATCH v6 11/15] strmap: add a strset sub-type Elijah Newren via GitGitGadget
                             ` (5 subsequent siblings)
  15 siblings, 0 replies; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-11 20:02 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Phillip Wood, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

This will facilitate adding entries to a strmap subtype in ways that
differ slightly from that of strmap_put().

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 strmap.c | 37 +++++++++++++++++++++++--------------
 1 file changed, 23 insertions(+), 14 deletions(-)

diff --git a/strmap.c b/strmap.c
index 0d10a884b5..dc84c57c07 100644
--- a/strmap.c
+++ b/strmap.c
@@ -70,27 +70,36 @@ void strmap_partial_clear(struct strmap *map, int free_values)
 	hashmap_partial_clear(&map->map);
 }
 
+static struct strmap_entry *create_entry(struct strmap *map,
+					 const char *str,
+					 void *data)
+{
+	struct strmap_entry *entry;
+	const char *key = str;
+
+	entry = xmalloc(sizeof(*entry));
+	hashmap_entry_init(&entry->ent, strhash(str));
+
+	if (map->strdup_strings)
+		key = xstrdup(str);
+	entry->key = key;
+	entry->value = data;
+	return entry;
+}
+
 void *strmap_put(struct strmap *map, const char *str, void *data)
 {
 	struct strmap_entry *entry = find_strmap_entry(map, str);
-	void *old = NULL;
 
 	if (entry) {
-		old = entry->value;
+		void *old = entry->value;
 		entry->value = data;
-	} else {
-		const char *key = str;
-
-		entry = xmalloc(sizeof(*entry));
-		hashmap_entry_init(&entry->ent, strhash(str));
-
-		if (map->strdup_strings)
-			key = xstrdup(str);
-		entry->key = key;
-		entry->value = data;
-		hashmap_add(&map->map, &entry->ent);
+		return old;
 	}
-	return old;
+
+	entry = create_entry(map, str, data);
+	hashmap_add(&map->map, &entry->ent);
+	return NULL;
 }
 
 struct strmap_entry *strmap_get_entry(struct strmap *map, const char *str)
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v6 11/15] strmap: add a strset sub-type
  2020-11-11 20:02         ` [PATCH v6 " Elijah Newren via GitGitGadget
                             ` (9 preceding siblings ...)
  2020-11-11 20:02           ` [PATCH v6 10/15] strmap: split create_entry() out of strmap_put() Elijah Newren via GitGitGadget
@ 2020-11-11 20:02           ` Elijah Newren via GitGitGadget
  2020-11-11 20:02           ` [PATCH v6 12/15] strmap: enable allocations to come from a mem_pool Elijah Newren via GitGitGadget
                             ` (4 subsequent siblings)
  15 siblings, 0 replies; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-11 20:02 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Phillip Wood, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Similar to adding strintmap for special-casing a string -> int mapping,
add a strset type for cases where we really are only interested in using
strmap for storing a set rather than a mapping.  In this case, we'll
always just store NULL for the value but the different struct type makes
it clearer than code comments how a variable is intended to be used.

The difference in usage also results in some differences in API: a few
things that aren't necessary or meaningful are dropped (namely, the
free_values argument to *_clear(), and the *_get() function), and
strset_add() is chosen as the API instead of strset_put().

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 strmap.c | 17 +++++++++++++++
 strmap.h | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 80 insertions(+)

diff --git a/strmap.c b/strmap.c
index dc84c57c07..3784865745 100644
--- a/strmap.c
+++ b/strmap.c
@@ -143,3 +143,20 @@ void strintmap_incr(struct strintmap *map, const char *str, intptr_t amt)
 	else
 		strintmap_set(map, str, map->default_value + amt);
 }
+
+int strset_add(struct strset *set, const char *str)
+{
+	/*
+	 * Cannot use strmap_put() because it'll return NULL in both cases:
+	 *   - cannot find str: NULL means "not found"
+	 *   - does find str: NULL is the value associated with str
+	 */
+	struct strmap_entry *entry = find_strmap_entry(&set->map, str);
+
+	if (entry)
+		return 0;
+
+	entry = create_entry(&set->map, str, NULL);
+	hashmap_add(&set->map.map, &entry->ent);
+	return 1;
+}
diff --git a/strmap.h b/strmap.h
index 56a5cdb864..c8c4d7c932 100644
--- a/strmap.h
+++ b/strmap.h
@@ -27,6 +27,7 @@ int cmp_strmap_entry(const void *hashmap_cmp_fn_data,
 			.map = STRMAP_INIT,   \
 			.default_value = 0,   \
 		       }
+#define STRSET_INIT { .map = STRMAP_INIT }
 
 /*
  * Initialize the members of the strmap.  Any keys added to the strmap will
@@ -196,4 +197,66 @@ static inline void strintmap_set(struct strintmap *map, const char *str,
  */
 void strintmap_incr(struct strintmap *map, const char *str, intptr_t amt);
 
+/*
+ * strset:
+ *    A set of strings.
+ *
+ * Primary differences with strmap:
+ *    1) The value is always NULL, and ignored.  As there is no value to free,
+ *       there is one fewer argument to strset_clear
+ *    2) No strset_get() because there is no value.
+ *    3) No strset_put(); use strset_add() instead.
+ */
+
+struct strset {
+	struct strmap map;
+};
+
+#define strset_for_each_entry(mystrset, iter, var)	\
+	strmap_for_each_entry(&(mystrset)->map, iter, var)
+
+static inline void strset_init(struct strset *set)
+{
+	strmap_init(&set->map);
+}
+
+static inline void strset_init_with_options(struct strset *set,
+					    int strdup_strings)
+{
+	strmap_init_with_options(&set->map, strdup_strings);
+}
+
+static inline void strset_clear(struct strset *set)
+{
+	strmap_clear(&set->map, 0);
+}
+
+static inline void strset_partial_clear(struct strset *set)
+{
+	strmap_partial_clear(&set->map, 0);
+}
+
+static inline int strset_contains(struct strset *set, const char *str)
+{
+	return strmap_contains(&set->map, str);
+}
+
+static inline void strset_remove(struct strset *set, const char *str)
+{
+	return strmap_remove(&set->map, str, 0);
+}
+
+static inline int strset_empty(struct strset *set)
+{
+	return strmap_empty(&set->map);
+}
+
+static inline unsigned int strset_get_size(struct strset *set)
+{
+	return strmap_get_size(&set->map);
+}
+
+/* Returns 1 if str is added to the set; returns 0 if str was already in set */
+int strset_add(struct strset *set, const char *str);
+
 #endif /* STRMAP_H */
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v6 12/15] strmap: enable allocations to come from a mem_pool
  2020-11-11 20:02         ` [PATCH v6 " Elijah Newren via GitGitGadget
                             ` (10 preceding siblings ...)
  2020-11-11 20:02           ` [PATCH v6 11/15] strmap: add a strset sub-type Elijah Newren via GitGitGadget
@ 2020-11-11 20:02           ` Elijah Newren via GitGitGadget
  2020-11-11 20:02           ` [PATCH v6 13/15] strmap: take advantage of FLEXPTR_ALLOC_STR when relevant Elijah Newren via GitGitGadget
                             ` (3 subsequent siblings)
  15 siblings, 0 replies; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-11 20:02 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Phillip Wood, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

For heavy users of strmaps, allowing the keys and entries to be
allocated from a memory pool can provide significant overhead savings.
Add an option to strmap_init_with_options() to specify a memory pool.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 strmap.c | 31 ++++++++++++++++++++++---------
 strmap.h | 11 ++++++++---
 2 files changed, 30 insertions(+), 12 deletions(-)

diff --git a/strmap.c b/strmap.c
index 3784865745..139afb9d4b 100644
--- a/strmap.c
+++ b/strmap.c
@@ -1,5 +1,6 @@
 #include "git-compat-util.h"
 #include "strmap.h"
+#include "mem-pool.h"
 
 int cmp_strmap_entry(const void *hashmap_cmp_fn_data,
 		     const struct hashmap_entry *entry1,
@@ -24,13 +25,15 @@ static struct strmap_entry *find_strmap_entry(struct strmap *map,
 
 void strmap_init(struct strmap *map)
 {
-	strmap_init_with_options(map, 1);
+	strmap_init_with_options(map, NULL, 1);
 }
 
 void strmap_init_with_options(struct strmap *map,
+			      struct mem_pool *pool,
 			      int strdup_strings)
 {
 	hashmap_init(&map->map, cmp_strmap_entry, NULL, 0);
+	map->pool = pool;
 	map->strdup_strings = strdup_strings;
 }
 
@@ -42,6 +45,10 @@ static void strmap_free_entries_(struct strmap *map, int free_values)
 	if (!map)
 		return;
 
+	if (!free_values && map->pool)
+		/* Memory other than util is owned by and freed with the pool */
+		return;
+
 	/*
 	 * We need to iterate over the hashmap entries and free
 	 * e->key and e->value ourselves; hashmap has no API to
@@ -52,9 +59,11 @@ static void strmap_free_entries_(struct strmap *map, int free_values)
 	hashmap_for_each_entry(&map->map, &iter, e, ent) {
 		if (free_values)
 			free(e->value);
-		if (map->strdup_strings)
-			free((char*)e->key);
-		free(e);
+		if (!map->pool) {
+			if (map->strdup_strings)
+				free((char*)e->key);
+			free(e);
+		}
 	}
 }
 
@@ -77,11 +86,13 @@ static struct strmap_entry *create_entry(struct strmap *map,
 	struct strmap_entry *entry;
 	const char *key = str;
 
-	entry = xmalloc(sizeof(*entry));
+	entry = map->pool ? mem_pool_alloc(map->pool, sizeof(*entry))
+			  : xmalloc(sizeof(*entry));
 	hashmap_entry_init(&entry->ent, strhash(str));
 
 	if (map->strdup_strings)
-		key = xstrdup(str);
+		key = map->pool ? mem_pool_strdup(map->pool, str)
+				: xstrdup(str);
 	entry->key = key;
 	entry->value = data;
 	return entry;
@@ -128,9 +139,11 @@ void strmap_remove(struct strmap *map, const char *str, int free_value)
 		return;
 	if (free_value)
 		free(ret->value);
-	if (map->strdup_strings)
-		free((char*)ret->key);
-	free(ret);
+	if (!map->pool) {
+		if (map->strdup_strings)
+			free((char*)ret->key);
+		free(ret);
+	}
 }
 
 void strintmap_incr(struct strintmap *map, const char *str, intptr_t amt)
diff --git a/strmap.h b/strmap.h
index c8c4d7c932..8745b7ceb1 100644
--- a/strmap.h
+++ b/strmap.h
@@ -3,8 +3,10 @@
 
 #include "hashmap.h"
 
+struct mem_pool;
 struct strmap {
 	struct hashmap map;
+	struct mem_pool *pool;
 	unsigned int strdup_strings:1;
 };
 
@@ -37,9 +39,10 @@ void strmap_init(struct strmap *map);
 
 /*
  * Same as strmap_init, but for those who want to control the memory management
- * carefully instead of using the default of strdup_strings=1.
+ * carefully instead of using the default of strdup_strings=1 and pool=NULL.
  */
 void strmap_init_with_options(struct strmap *map,
+			      struct mem_pool *pool,
 			      int strdup_strings);
 
 /*
@@ -137,9 +140,10 @@ static inline void strintmap_init(struct strintmap *map, int default_value)
 
 static inline void strintmap_init_with_options(struct strintmap *map,
 					       int default_value,
+					       struct mem_pool *pool,
 					       int strdup_strings)
 {
-	strmap_init_with_options(&map->map, strdup_strings);
+	strmap_init_with_options(&map->map, pool, strdup_strings);
 	map->default_value = default_value;
 }
 
@@ -221,9 +225,10 @@ static inline void strset_init(struct strset *set)
 }
 
 static inline void strset_init_with_options(struct strset *set,
+					    struct mem_pool *pool,
 					    int strdup_strings)
 {
-	strmap_init_with_options(&set->map, strdup_strings);
+	strmap_init_with_options(&set->map, pool, strdup_strings);
 }
 
 static inline void strset_clear(struct strset *set)
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v6 13/15] strmap: take advantage of FLEXPTR_ALLOC_STR when relevant
  2020-11-11 20:02         ` [PATCH v6 " Elijah Newren via GitGitGadget
                             ` (11 preceding siblings ...)
  2020-11-11 20:02           ` [PATCH v6 12/15] strmap: enable allocations to come from a mem_pool Elijah Newren via GitGitGadget
@ 2020-11-11 20:02           ` Elijah Newren via GitGitGadget
  2020-11-11 20:02           ` [PATCH v6 14/15] Use new HASHMAP_INIT macro to simplify hashmap initialization Elijah Newren via GitGitGadget
                             ` (2 subsequent siblings)
  15 siblings, 0 replies; 144+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-11 20:02 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Phillip Wood, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

By default, we do not use a mempool and strdup_strings is true; in this
case, we can avoid both an extra allocation and an extra free by just
over-allocating for the strmap_entry leaving enough space at the end to
copy the key.  FLEXPTR_ALLOC_STR exists for exactly this purpose, so
make use of it.

Also, adjust the case when we are using a memory pool and strdup_strings
is true to just do one allocation from the memory pool instead of two so
that the strmap_clear() and strmap_remove() code can just avoid freeing
the key in all cases.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 strmap.c | 35 +++++++++++++++++++----------------
 strmap.h |  1 +
 2 files changed, 20 insertions(+), 16 deletions(-)

diff --git a/strmap.c b/strmap.c
index 139afb9d4b..4fb9f6100e 100644
--- a/strmap.c
+++ b/strmap.c
@@ -59,11 +59,8 @@ static void strmap_free_entries_(struct strmap *map, int free_values)
 	hashmap_for_each_entry(&map->map, &iter, e, ent) {
 		if (free_values)
 			free(e->value);
-		if (!map->pool) {
-			if (map->strdup_strings)
-				free((char*)e->key);
+		if (!map->pool)
 			free(e);
-		}
 	}
 }
 
@@ -84,16 +81,25 @@ static struct strmap_entry *create_entry(struct strmap *map,
 					 void *data)
 {
 	struct strmap_entry *entry;
-	const char *key = str;
 
-	entry = map->pool ? mem_pool_alloc(map->pool, sizeof(*entry))
-			  : xmalloc(sizeof(*entry));
+	if (map->strdup_strings) {
+		if (!map->pool) {
+			FLEXPTR_ALLOC_STR(entry, key, str);
+		} else {
+			size_t len = st_add(strlen(str), 1); /* include NUL */
+			entry = mem_pool_alloc(map->pool,
+					       st_add(sizeof(*entry), len));
+			memcpy(entry + 1, str, len);
+			entry->key = (void *)(entry + 1);
+		}
+	} else if (!map->pool) {
+		entry = xmalloc(sizeof(*entry));
+	} else {
+		entry = mem_pool_alloc(map->pool, sizeof(*entry));
+	}
 	hashmap_entry_init(&entry->ent, strhash(str));
-
-	if (map->strdup_strings)
-		key = map->pool ? mem_pool_strdup(map->pool, str)
-				: xstrdup(str);
-	entry->key = key;
+	if (!map->strdup_strings)
+		entry->key = str;
 	entry->value = data;
 	return entry;
 }
@@ -139,11 +145,8 @@ void strmap_remove(struct strmap *map, const char *str, int free_value)
 		return;
 	if (free_value)
 		free(ret->value);
-	if (!map->pool) {
-		if (map->strdup_strings)
-			free((char*)ret->key);
+	if (!map->pool)
 		free(ret);
-	}
 }
 
 void strintmap_incr(struct strintmap *map, const char *str, intptr_t amt)
diff --git a/strmap.h b/strmap.h
index 8745b7ceb1..c4c104411b 100644
--- a/strmap.h
+++ b/strmap.h
@@ -14,6 +14,7 @@ struct strmap_entry {
 	struct hashmap_entry ent;
 	const char *key;
 	void *value;
+	/* strmap_entry may be allocated extra space to store the key at end */
 };
 
 int cmp_strmap_entry(const void *hashmap_cmp_fn_data,
-- 
gitgitgadget


^ permalink raw reply	[