git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/2] Introduce data field in hashmap and migrate docs to header
@ 2017-06-29  1:13 Stefan Beller
  2017-06-29  1:13 ` [PATCH 1/2] hashmap.h: compare function has access to a data field Stefan Beller
                   ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Stefan Beller @ 2017-06-29  1:13 UTC (permalink / raw)
  To: gitster; +Cc: git, Stefan Beller

https://public-inbox.org/git/xmqqpodnvmmw.fsf@gitster.mtv.corp.google.com/
for context why we need a new data field.  Implement that.

Once upon a time we had a long discussion where to put documentation best.
The answer was header files as there documentation has less chance
to become stale and be out of date.  Improve the docs by
* migrating them to the header
* clarifying how the compare function is to be used
* how the arguments to hashmap_get/remove should be used.

Thanks,
Stefan

Stefan Beller (2):
  hashmap.h: compare function has access to a data field
  hashmap: migrate documentation from Documentation/technical into
    header

 Documentation/technical/api-hashmap.txt | 309 --------------------------------
 attr.c                                  |   4 +-
 builtin/describe.c                      |   6 +-
 builtin/difftool.c                      |  20 ++-
 builtin/fast-export.c                   |   5 +-
 config.c                                |   7 +-
 convert.c                               |   3 +-
 diffcore-rename.c                       |   2 +-
 hashmap.c                               |  17 +-
 hashmap.h                               | 256 +++++++++++++++++++++++---
 name-hash.c                             |  12 +-
 oidset.c                                |   5 +-
 patch-ids.c                             |   6 +-
 refs.c                                  |   4 +-
 remote.c                                |   7 +-
 sha1_file.c                             |   5 +-
 sub-process.c                           |   5 +-
 sub-process.h                           |   6 +-
 submodule-config.c                      |  10 +-
 t/helper/test-hashmap.c                 |  15 +-
 20 files changed, 325 insertions(+), 379 deletions(-)
 delete mode 100644 Documentation/technical/api-hashmap.txt

-- 
2.13.0.31.g9b732c453e


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/2] hashmap.h: compare function has access to a data field
  2017-06-29  1:13 [PATCH 0/2] Introduce data field in hashmap and migrate docs to header Stefan Beller
@ 2017-06-29  1:13 ` Stefan Beller
  2017-06-29 18:06   ` Junio C Hamano
  2017-06-29  1:13 ` [PATCH 2/2] hashmap: migrate documentation from Documentation/technical into header Stefan Beller
  2017-06-29 23:53 ` [PATCHv2 0/2] Introduce data field in hashmap and migrate docs to header Stefan Beller
  2 siblings, 1 reply; 26+ messages in thread
From: Stefan Beller @ 2017-06-29  1:13 UTC (permalink / raw)
  To: gitster; +Cc: git, Stefan Beller

When using the hashmap a common need is to have access to arbitrary data
in the compare function. A couple of times we abuse the keydata field
to pass in the data needed. This happens for example in patch-ids.c.

This patch changes the function signature of the compare function
to have one more void pointer available. The pointer given for each
invocation of the compare function must be defined in the init function
of the hashmap and is just passed through.

Documentation of this new feature is deferred to a later patch.

While at it improve the naming of the fields of all compare functions used
by hashmaps by ensuring unused parameters are prefixed with 'unused_' and
naming the parameters what they are (instead of 'unused' make it
'unused_keydata').

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 attr.c                  |  4 ++--
 builtin/describe.c      |  6 ++++--
 builtin/difftool.c      | 20 ++++++++++++--------
 builtin/fast-export.c   |  5 +++--
 config.c                |  7 +++++--
 convert.c               |  3 ++-
 diffcore-rename.c       |  2 +-
 hashmap.c               | 17 ++++++++++++-----
 hashmap.h               |  9 ++++++---
 name-hash.c             | 12 ++++++++----
 oidset.c                |  5 +++--
 patch-ids.c             |  6 ++++--
 refs.c                  |  4 ++--
 remote.c                |  7 +++++--
 sha1_file.c             |  5 +++--
 sub-process.c           |  5 +++--
 sub-process.h           |  6 ++++--
 submodule-config.c      | 10 ++++++----
 t/helper/test-hashmap.c | 15 ++++++++++-----
 19 files changed, 95 insertions(+), 53 deletions(-)

diff --git a/attr.c b/attr.c
index 37454999d2..2f51417675 100644
--- a/attr.c
+++ b/attr.c
@@ -78,7 +78,7 @@ struct attr_hash_entry {
 /* attr_hashmap comparison function */
 static int attr_hash_entry_cmp(const struct attr_hash_entry *a,
 			       const struct attr_hash_entry *b,
-			       void *unused)
+			       void *unused_keydata, void *unused_data)
 {
 	return (a->keylen != b->keylen) || strncmp(a->key, b->key, a->keylen);
 }
@@ -86,7 +86,7 @@ static int attr_hash_entry_cmp(const struct attr_hash_entry *a,
 /* Initialize an 'attr_hashmap' object */
 static void attr_hashmap_init(struct attr_hashmap *map)
 {
-	hashmap_init(&map->map, (hashmap_cmp_fn) attr_hash_entry_cmp, 0);
+	hashmap_init(&map->map, (hashmap_cmp_fn) attr_hash_entry_cmp, NULL, 0);
 }
 
 /*
diff --git a/builtin/describe.c b/builtin/describe.c
index 70eb144608..a6c5a969a0 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -55,7 +55,9 @@ static const char *prio_names[] = {
 };
 
 static int commit_name_cmp(const struct commit_name *cn1,
-		const struct commit_name *cn2, const void *peeled)
+			   const struct commit_name *cn2,
+			   const void *peeled,
+			   const void *unused_data)
 {
 	return oidcmp(&cn1->peeled, peeled ? peeled : &cn2->peeled);
 }
@@ -501,7 +503,7 @@ int cmd_describe(int argc, const char **argv, const char *prefix)
 		return cmd_name_rev(args.argc, args.argv, prefix);
 	}
 
-	hashmap_init(&names, (hashmap_cmp_fn) commit_name_cmp, 0);
+	hashmap_init(&names, (hashmap_cmp_fn) commit_name_cmp, NULL, 0);
 	for_each_rawref(get_name, NULL);
 	if (!names.size && !always)
 		die(_("No names found, cannot describe anything."));
diff --git a/builtin/difftool.c b/builtin/difftool.c
index 9199227f6e..80786a95ab 100644
--- a/builtin/difftool.c
+++ b/builtin/difftool.c
@@ -131,7 +131,9 @@ struct working_tree_entry {
 };
 
 static int working_tree_entry_cmp(struct working_tree_entry *a,
-				  struct working_tree_entry *b, void *keydata)
+				  struct working_tree_entry *b,
+				  void *unused_keydata,
+				  void *unused_data)
 {
 	return strcmp(a->path, b->path);
 }
@@ -146,7 +148,8 @@ struct pair_entry {
 	const char path[FLEX_ARRAY];
 };
 
-static int pair_cmp(struct pair_entry *a, struct pair_entry *b, void *keydata)
+static int pair_cmp(struct pair_entry *a, struct pair_entry *b,
+		    void *unused_keydata, void *unused_data)
 {
 	return strcmp(a->path, b->path);
 }
@@ -174,7 +177,8 @@ struct path_entry {
 	char path[FLEX_ARRAY];
 };
 
-static int path_entry_cmp(struct path_entry *a, struct path_entry *b, void *key)
+static int path_entry_cmp(struct path_entry *a, struct path_entry *b,
+			  void *key, void *unused_data)
 {
 	return strcmp(a->path, key ? key : b->path);
 }
@@ -367,9 +371,9 @@ static int run_dir_diff(const char *extcmd, int symlinks, const char *prefix,
 	wtdir_len = wtdir.len;
 
 	hashmap_init(&working_tree_dups,
-		     (hashmap_cmp_fn)working_tree_entry_cmp, 0);
-	hashmap_init(&submodules, (hashmap_cmp_fn)pair_cmp, 0);
-	hashmap_init(&symlinks2, (hashmap_cmp_fn)pair_cmp, 0);
+		     (hashmap_cmp_fn)working_tree_entry_cmp, NULL, 0);
+	hashmap_init(&submodules, (hashmap_cmp_fn)pair_cmp, NULL, 0);
+	hashmap_init(&symlinks2, (hashmap_cmp_fn)pair_cmp, NULL, 0);
 
 	child.no_stdin = 1;
 	child.git_cmd = 1;
@@ -580,9 +584,9 @@ static int run_dir_diff(const char *extcmd, int symlinks, const char *prefix,
 	 * files through the symlink.
 	 */
 	hashmap_init(&wt_modified, (hashmap_cmp_fn)path_entry_cmp,
-		     wtindex.cache_nr);
+		     NULL, wtindex.cache_nr);
 	hashmap_init(&tmp_modified, (hashmap_cmp_fn)path_entry_cmp,
-		     wtindex.cache_nr);
+		     NULL, wtindex.cache_nr);
 
 	for (i = 0; i < wtindex.cache_nr; i++) {
 		struct hashmap_entry dummy;
diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 12d501bfde..fa389fe252 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -94,7 +94,8 @@ struct anonymized_entry {
 };
 
 static int anonymized_entry_cmp(const void *va, const void *vb,
-				const void *data)
+				const void *unused_keydata,
+				const void *unused_data)
 {
 	const struct anonymized_entry *a = va, *b = vb;
 	return a->orig_len != b->orig_len ||
@@ -113,7 +114,7 @@ static const void *anonymize_mem(struct hashmap *map,
 	struct anonymized_entry key, *ret;
 
 	if (!map->cmpfn)
-		hashmap_init(map, anonymized_entry_cmp, 0);
+		hashmap_init(map, anonymized_entry_cmp, NULL, 0);
 
 	hashmap_entry_init(&key, memhash(orig, *len));
 	key.orig = orig;
diff --git a/config.c b/config.c
index 1cd40a5fe6..aa0198a5fd 100644
--- a/config.c
+++ b/config.c
@@ -1754,14 +1754,17 @@ static int configset_add_value(struct config_set *cs, const char *key, const cha
 }
 
 static int config_set_element_cmp(const struct config_set_element *e1,
-				 const struct config_set_element *e2, const void *unused)
+				 const struct config_set_element *e2,
+				 const void *unused_keydata,
+				 const void *unused_data)
 {
 	return strcmp(e1->key, e2->key);
 }
 
 void git_configset_init(struct config_set *cs)
 {
-	hashmap_init(&cs->config_hash, (hashmap_cmp_fn)config_set_element_cmp, 0);
+	hashmap_init(&cs->config_hash, (hashmap_cmp_fn)config_set_element_cmp,
+		     NULL, 0);
 	cs->hash_initialized = 1;
 	cs->list.nr = 0;
 	cs->list.alloc = 0;
diff --git a/convert.c b/convert.c
index 7d2a519daf..deaf0ba7b3 100644
--- a/convert.c
+++ b/convert.c
@@ -583,7 +583,8 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len
 
 	if (!subprocess_map_initialized) {
 		subprocess_map_initialized = 1;
-		hashmap_init(&subprocess_map, (hashmap_cmp_fn) cmd2process_cmp, 0);
+		hashmap_init(&subprocess_map, (hashmap_cmp_fn) cmd2process_cmp,
+			     NULL, 0);
 		entry = NULL;
 	} else {
 		entry = (struct cmd2process *)subprocess_find_entry(&subprocess_map, cmd);
diff --git a/diffcore-rename.c b/diffcore-rename.c
index 1e4678b7d7..786f389498 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -341,7 +341,7 @@ static int find_exact_renames(struct diff_options *options)
 	/* Add all sources to the hash table in reverse order, because
 	 * later on they will be retrieved in LIFO order.
 	 */
-	hashmap_init(&file_table, NULL, rename_src_nr);
+	hashmap_init(&file_table, NULL, NULL, rename_src_nr);
 	for (i = rename_src_nr-1; i >= 0; i--)
 		insert_file_table(&file_table, i, rename_src[i].p->one);
 
diff --git a/hashmap.c b/hashmap.c
index 7d1044eb5d..82d70b329b 100644
--- a/hashmap.c
+++ b/hashmap.c
@@ -95,7 +95,9 @@ static inline int entry_equals(const struct hashmap *map,
 		const struct hashmap_entry *e1, const struct hashmap_entry *e2,
 		const void *keydata)
 {
-	return (e1 == e2) || (e1->hash == e2->hash && !map->cmpfn(e1, e2, keydata));
+	return (e1 == e2) ||
+	       (e1->hash == e2->hash &&
+		!map->cmpfn(e1, e2, keydata, map->data));
 }
 
 static inline unsigned int bucket(const struct hashmap *map,
@@ -140,19 +142,23 @@ static inline struct hashmap_entry **find_entry_ptr(const struct hashmap *map,
 	return e;
 }
 
-static int always_equal(const void *unused1, const void *unused2, const void *unused3)
+static int always_equal(const void *unused1,
+			const void *unused2,
+			const void *unused_keydata,
+			const void *unused_data)
 {
 	return 0;
 }
 
 void hashmap_init(struct hashmap *map, hashmap_cmp_fn equals_function,
-		size_t initial_size)
+		const void *data, size_t initial_size)
 {
 	unsigned int size = HASHMAP_INITIAL_SIZE;
 
 	memset(map, 0, sizeof(*map));
 
 	map->cmpfn = equals_function ? equals_function : always_equal;
+	map->data = data;
 
 	/* calculate initial table size and allocate the table */
 	initial_size = (unsigned int) ((uint64_t) initial_size * 100
@@ -262,7 +268,8 @@ struct pool_entry {
 
 static int pool_entry_cmp(const struct pool_entry *e1,
 			  const struct pool_entry *e2,
-			  const unsigned char *keydata)
+			  const unsigned char *keydata,
+			  const void *unused_data)
 {
 	return e1->data != keydata &&
 	       (e1->len != e2->len || memcmp(e1->data, keydata, e1->len));
@@ -275,7 +282,7 @@ const void *memintern(const void *data, size_t len)
 
 	/* initialize string pool hashmap */
 	if (!map.tablesize)
-		hashmap_init(&map, (hashmap_cmp_fn) pool_entry_cmp, 0);
+		hashmap_init(&map, (hashmap_cmp_fn) pool_entry_cmp, NULL, 0);
 
 	/* lookup interned string in pool */
 	hashmap_entry_init(&key, memhash(data, len));
diff --git a/hashmap.h b/hashmap.h
index de6022a3a9..1c26bbad5b 100644
--- a/hashmap.h
+++ b/hashmap.h
@@ -33,11 +33,12 @@ struct hashmap_entry {
 };
 
 typedef int (*hashmap_cmp_fn)(const void *entry, const void *entry_or_key,
-		const void *keydata);
+		const void *keydata, const void *cbdata);
 
 struct hashmap {
 	struct hashmap_entry **table;
 	hashmap_cmp_fn cmpfn;
+	const void *data;
 	unsigned int size, tablesize, grow_at, shrink_at;
 	unsigned disallow_rehash : 1;
 };
@@ -50,8 +51,10 @@ struct hashmap_iter {
 
 /* hashmap functions */
 
-extern void hashmap_init(struct hashmap *map, hashmap_cmp_fn equals_function,
-		size_t initial_size);
+extern void hashmap_init(struct hashmap *map,
+			 hashmap_cmp_fn equals_function,
+			 const void *data,
+			 size_t initial_size);
 extern void hashmap_free(struct hashmap *map, int free_entries);
 
 /* hashmap_entry functions */
diff --git a/name-hash.c b/name-hash.c
index 39309efb7f..4e9eac925a 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -17,7 +17,9 @@ struct dir_entry {
 };
 
 static int dir_entry_cmp(const struct dir_entry *e1,
-		const struct dir_entry *e2, const char *name)
+			 const struct dir_entry *e2,
+			 const char *name,
+			 const void *unused_data)
 {
 	return e1->namelen != e2->namelen || strncasecmp(e1->name,
 			name ? name : e2->name, e1->namelen);
@@ -108,7 +110,9 @@ static void hash_index_entry(struct index_state *istate, struct cache_entry *ce)
 }
 
 static int cache_entry_cmp(const struct cache_entry *ce1,
-		const struct cache_entry *ce2, const void *remove)
+			   const struct cache_entry *ce2,
+			   const void *remove,
+			   const void *unused_data)
 {
 	/*
 	 * For remove_name_hash, find the exact entry (pointer equality); for
@@ -571,9 +575,9 @@ static void lazy_init_name_hash(struct index_state *istate)
 	if (istate->name_hash_initialized)
 		return;
 	hashmap_init(&istate->name_hash, (hashmap_cmp_fn) cache_entry_cmp,
-			istate->cache_nr);
+			NULL, istate->cache_nr);
 	hashmap_init(&istate->dir_hash, (hashmap_cmp_fn) dir_entry_cmp,
-			istate->cache_nr);
+			NULL, istate->cache_nr);
 
 	if (lookup_lazy_params(istate)) {
 		hashmap_disallow_rehash(&istate->dir_hash, 1);
diff --git a/oidset.c b/oidset.c
index ac169f05d3..f20e6e4df3 100644
--- a/oidset.c
+++ b/oidset.c
@@ -7,7 +7,8 @@ struct oidset_entry {
 };
 
 static int oidset_hashcmp(const void *va, const void *vb,
-			  const void *vkey)
+			  const void *vkey,
+			  const void *unused_data)
 {
 	const struct oidset_entry *a = va, *b = vb;
 	const struct object_id *key = vkey;
@@ -30,7 +31,7 @@ int oidset_insert(struct oidset *set, const struct object_id *oid)
 	struct oidset_entry *entry;
 
 	if (!set->map.cmpfn)
-		hashmap_init(&set->map, oidset_hashcmp, 0);
+		hashmap_init(&set->map, oidset_hashcmp, NULL, 0);
 
 	if (oidset_contains(set, oid))
 		return 1;
diff --git a/patch-ids.c b/patch-ids.c
index 9c0ab9e67a..b9b2ebbad0 100644
--- a/patch-ids.c
+++ b/patch-ids.c
@@ -37,6 +37,7 @@ int commit_patch_id(struct commit *commit, struct diff_options *options,
  */
 static int patch_id_cmp(struct patch_id *a,
 			struct patch_id *b,
+			const void *unused_keydata,
 			struct diff_options *opt)
 {
 	if (is_null_oid(&a->patch_id) &&
@@ -57,7 +58,8 @@ int init_patch_ids(struct patch_ids *ids)
 	ids->diffopts.detect_rename = 0;
 	DIFF_OPT_SET(&ids->diffopts, RECURSIVE);
 	diff_setup_done(&ids->diffopts);
-	hashmap_init(&ids->patches, (hashmap_cmp_fn)patch_id_cmp, 256);
+	hashmap_init(&ids->patches, (hashmap_cmp_fn)patch_id_cmp,
+		     &ids->diffopts, 256);
 	return 0;
 }
 
@@ -93,7 +95,7 @@ struct patch_id *has_commit_patch_id(struct commit *commit,
 	if (init_patch_id_entry(&patch, commit, ids))
 		return NULL;
 
-	return hashmap_get(&ids->patches, &patch, &ids->diffopts);
+	return hashmap_get(&ids->patches, &patch, NULL);
 }
 
 struct patch_id *add_commit_patch_id(struct commit *commit,
diff --git a/refs.c b/refs.c
index 88658ba769..71e0170feb 100644
--- a/refs.c
+++ b/refs.c
@@ -1526,7 +1526,7 @@ struct ref_store_hash_entry
 };
 
 static int ref_store_hash_cmp(const void *entry, const void *entry_or_key,
-			      const void *keydata)
+			      const void *keydata, const void *unused_data)
 {
 	const struct ref_store_hash_entry *e1 = entry, *e2 = entry_or_key;
 	const char *name = keydata ? keydata : e2->name;
@@ -1608,7 +1608,7 @@ static void register_ref_store_map(struct hashmap *map,
 				   const char *name)
 {
 	if (!map->tablesize)
-		hashmap_init(map, ref_store_hash_cmp, 0);
+		hashmap_init(map, ref_store_hash_cmp, NULL, 0);
 
 	if (hashmap_put(map, alloc_ref_store_hash_entry(name, refs)))
 		die("BUG: %s ref_store '%s' initialized twice", type, name);
diff --git a/remote.c b/remote.c
index d87482573d..b5e05890ee 100644
--- a/remote.c
+++ b/remote.c
@@ -133,7 +133,10 @@ struct remotes_hash_key {
 	int len;
 };
 
-static int remotes_hash_cmp(const struct remote *a, const struct remote *b, const struct remotes_hash_key *key)
+static int remotes_hash_cmp(const struct remote *a,
+			    const struct remote *b,
+			    const struct remotes_hash_key *key,
+			    const void *unused_data)
 {
 	if (key)
 		return strncmp(a->name, key->str, key->len) || a->name[key->len];
@@ -144,7 +147,7 @@ static int remotes_hash_cmp(const struct remote *a, const struct remote *b, cons
 static inline void init_remotes_hash(void)
 {
 	if (!remotes_hash.cmpfn)
-		hashmap_init(&remotes_hash, (hashmap_cmp_fn)remotes_hash_cmp, 0);
+		hashmap_init(&remotes_hash, (hashmap_cmp_fn)remotes_hash_cmp, NULL, 0);
 }
 
 static struct remote *make_remote(const char *name, int len)
diff --git a/sha1_file.c b/sha1_file.c
index fb1fd809dc..b748db31ed 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2390,7 +2390,8 @@ static int delta_base_cache_key_eq(const struct delta_base_cache_key *a,
 }
 
 static int delta_base_cache_hash_cmp(const void *va, const void *vb,
-				     const void *vkey)
+				     const void *vkey,
+				     const void *unused_data)
 {
 	const struct delta_base_cache_entry *a = va, *b = vb;
 	const struct delta_base_cache_key *key = vkey;
@@ -2472,7 +2473,7 @@ static void add_delta_base_cache(struct packed_git *p, off_t base_offset,
 	list_add_tail(&ent->lru, &delta_base_cache_lru);
 
 	if (!delta_base_cache.cmpfn)
-		hashmap_init(&delta_base_cache, delta_base_cache_hash_cmp, 0);
+		hashmap_init(&delta_base_cache, delta_base_cache_hash_cmp, NULL, 0);
 	hashmap_entry_init(ent, pack_entry_hash(p, base_offset));
 	hashmap_add(&delta_base_cache, ent);
 }
diff --git a/sub-process.c b/sub-process.c
index 92f8aea70a..68b53cf48a 100644
--- a/sub-process.c
+++ b/sub-process.c
@@ -6,8 +6,9 @@
 #include "pkt-line.h"
 
 int cmd2process_cmp(const struct subprocess_entry *e1,
-			   const struct subprocess_entry *e2,
-			   const void *unused)
+		    const struct subprocess_entry *e2,
+		    const void *unused_keydata,
+		    const void *unused_data)
 {
 	return strcmp(e1->cmd, e2->cmd);
 }
diff --git a/sub-process.h b/sub-process.h
index d9a45cd359..a19b4d9936 100644
--- a/sub-process.h
+++ b/sub-process.h
@@ -20,8 +20,10 @@ struct subprocess_entry {
 
 /* subprocess functions */
 
-int cmd2process_cmp(const struct subprocess_entry *e1,
-	const struct subprocess_entry *e2, const void *unused);
+extern int cmd2process_cmp(const struct subprocess_entry *e1,
+			   const struct subprocess_entry *e2,
+			   const void *unused_keydata,
+			   const void *unused_data);
 
 typedef int(*subprocess_start_fn)(struct subprocess_entry *entry);
 int subprocess_start(struct hashmap *hashmap, struct subprocess_entry *entry, const char *cmd,
diff --git a/submodule-config.c b/submodule-config.c
index d8f8d5ea32..5c2fc489b8 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -36,7 +36,8 @@ static int is_cache_init;
 
 static int config_path_cmp(const struct submodule_entry *a,
 			   const struct submodule_entry *b,
-			   const void *unused)
+			   const void *unused_keydata,
+			   const void *unused_data)
 {
 	return strcmp(a->config->path, b->config->path) ||
 	       hashcmp(a->config->gitmodules_sha1, b->config->gitmodules_sha1);
@@ -44,7 +45,8 @@ static int config_path_cmp(const struct submodule_entry *a,
 
 static int config_name_cmp(const struct submodule_entry *a,
 			   const struct submodule_entry *b,
-			   const void *unused)
+			   const void *unused_keydata,
+			   const void *unused_data)
 {
 	return strcmp(a->config->name, b->config->name) ||
 	       hashcmp(a->config->gitmodules_sha1, b->config->gitmodules_sha1);
@@ -52,8 +54,8 @@ static int config_name_cmp(const struct submodule_entry *a,
 
 static void cache_init(struct submodule_cache *cache)
 {
-	hashmap_init(&cache->for_path, (hashmap_cmp_fn) config_path_cmp, 0);
-	hashmap_init(&cache->for_name, (hashmap_cmp_fn) config_name_cmp, 0);
+	hashmap_init(&cache->for_path, (hashmap_cmp_fn) config_path_cmp, NULL, 0);
+	hashmap_init(&cache->for_name, (hashmap_cmp_fn) config_name_cmp, NULL, 0);
 }
 
 static void free_one_config(struct submodule_entry *entry)
diff --git a/t/helper/test-hashmap.c b/t/helper/test-hashmap.c
index 7aa9440e27..72cdc4a6b1 100644
--- a/t/helper/test-hashmap.c
+++ b/t/helper/test-hashmap.c
@@ -14,13 +14,17 @@ static const char *get_value(const struct test_entry *e)
 }
 
 static int test_entry_cmp(const struct test_entry *e1,
-		const struct test_entry *e2, const char* key)
+			  const struct test_entry *e2,
+			  const char* key,
+			  const void *unused_data)
 {
 	return strcmp(e1->key, key ? key : e2->key);
 }
 
 static int test_entry_cmp_icase(const struct test_entry *e1,
-		const struct test_entry *e2, const char* key)
+				const struct test_entry *e2,
+				const char* key,
+				const void *unused_data)
 {
 	return strcasecmp(e1->key, key ? key : e2->key);
 }
@@ -92,7 +96,8 @@ static void perf_hashmap(unsigned int method, unsigned int rounds)
 	if (method & TEST_ADD) {
 		/* test adding to the map */
 		for (j = 0; j < rounds; j++) {
-			hashmap_init(&map, (hashmap_cmp_fn) test_entry_cmp, 0);
+			hashmap_init(&map, (hashmap_cmp_fn) test_entry_cmp,
+				     NULL, 0);
 
 			/* add entries */
 			for (i = 0; i < TEST_SIZE; i++) {
@@ -104,7 +109,7 @@ static void perf_hashmap(unsigned int method, unsigned int rounds)
 		}
 	} else {
 		/* test map lookups */
-		hashmap_init(&map, (hashmap_cmp_fn) test_entry_cmp, 0);
+		hashmap_init(&map, (hashmap_cmp_fn) test_entry_cmp, NULL, 0);
 
 		/* fill the map (sparsely if specified) */
 		j = (method & TEST_SPARSE) ? TEST_SIZE / 10 : TEST_SIZE;
@@ -147,7 +152,7 @@ int cmd_main(int argc, const char **argv)
 	/* init hash map */
 	icase = argc > 1 && !strcmp("ignorecase", argv[1]);
 	hashmap_init(&map, (hashmap_cmp_fn) (icase ? test_entry_cmp_icase
-			: test_entry_cmp), 0);
+			: test_entry_cmp), NULL, 0);
 
 	/* process commands from stdin */
 	while (fgets(line, sizeof(line), stdin)) {
-- 
2.13.0.31.g9b732c453e


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/2] hashmap: migrate documentation from Documentation/technical into header
  2017-06-29  1:13 [PATCH 0/2] Introduce data field in hashmap and migrate docs to header Stefan Beller
  2017-06-29  1:13 ` [PATCH 1/2] hashmap.h: compare function has access to a data field Stefan Beller
@ 2017-06-29  1:13 ` Stefan Beller
  2017-06-29 13:25   ` Jeff Hostetler
  2017-06-29 21:18   ` Jonathan Nieder
  2017-06-29 23:53 ` [PATCHv2 0/2] Introduce data field in hashmap and migrate docs to header Stefan Beller
  2 siblings, 2 replies; 26+ messages in thread
From: Stefan Beller @ 2017-06-29  1:13 UTC (permalink / raw)
  To: gitster; +Cc: git, Stefan Beller

While at it, clarify the use of `key`, `keydata`, `entry_or_key` as well
as documenting the new data pointer for the compare function.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 Documentation/technical/api-hashmap.txt | 309 --------------------------------
 hashmap.h                               | 249 +++++++++++++++++++++++--
 2 files changed, 231 insertions(+), 327 deletions(-)
 delete mode 100644 Documentation/technical/api-hashmap.txt

diff --git a/Documentation/technical/api-hashmap.txt b/Documentation/technical/api-hashmap.txt
deleted file mode 100644
index ccc634bbd7..0000000000
--- a/Documentation/technical/api-hashmap.txt
+++ /dev/null
@@ -1,309 +0,0 @@
-hashmap API
-===========
-
-The hashmap API is a generic implementation of hash-based key-value mappings.
-
-Data Structures
----------------
-
-`struct hashmap`::
-
-	The hash table structure. Members can be used as follows, but should
-	not be modified directly:
-+
-The `size` member keeps track of the total number of entries (0 means the
-hashmap is empty).
-+
-`tablesize` is the allocated size of the hash table. A non-0 value indicates
-that the hashmap is initialized. It may also be useful for statistical purposes
-(i.e. `size / tablesize` is the current load factor).
-+
-`cmpfn` stores the comparison function specified in `hashmap_init()`. In
-advanced scenarios, it may be useful to change this, e.g. to switch between
-case-sensitive and case-insensitive lookup.
-+
-When `disallow_rehash` is set, automatic rehashes are prevented during inserts
-and deletes.
-
-`struct hashmap_entry`::
-
-	An opaque structure representing an entry in the hash table, which must
-	be used as first member of user data structures. Ideally it should be
-	followed by an int-sized member to prevent unused memory on 64-bit
-	systems due to alignment.
-+
-The `hash` member is the entry's hash code and the `next` member points to the
-next entry in case of collisions (i.e. if multiple entries map to the same
-bucket).
-
-`struct hashmap_iter`::
-
-	An iterator structure, to be used with hashmap_iter_* functions.
-
-Types
------
-
-`int (*hashmap_cmp_fn)(const void *entry, const void *entry_or_key, const void *keydata)`::
-
-	User-supplied function to test two hashmap entries for equality. Shall
-	return 0 if the entries are equal.
-+
-This function is always called with non-NULL `entry` / `entry_or_key`
-parameters that have the same hash code. When looking up an entry, the `key`
-and `keydata` parameters to hashmap_get and hashmap_remove are always passed
-as second and third argument, respectively. Otherwise, `keydata` is NULL.
-
-Functions
----------
-
-`unsigned int strhash(const char *buf)`::
-`unsigned int strihash(const char *buf)`::
-`unsigned int memhash(const void *buf, size_t len)`::
-`unsigned int memihash(const void *buf, size_t len)`::
-`unsigned int memihash_cont(unsigned int hash_seed, const void *buf, size_t len)`::
-
-	Ready-to-use hash functions for strings, using the FNV-1 algorithm (see
-	http://www.isthe.com/chongo/tech/comp/fnv).
-+
-`strhash` and `strihash` take 0-terminated strings, while `memhash` and
-`memihash` operate on arbitrary-length memory.
-+
-`strihash` and `memihash` are case insensitive versions.
-+
-`memihash_cont` is a variant of `memihash` that allows a computation to be
-continued with another chunk of data.
-
-`unsigned int sha1hash(const unsigned char *sha1)`::
-
-	Converts a cryptographic hash (e.g. SHA-1) into an int-sized hash code
-	for use in hash tables. Cryptographic hashes are supposed to have
-	uniform distribution, so in contrast to `memhash()`, this just copies
-	the first `sizeof(int)` bytes without shuffling any bits. Note that
-	the results will be different on big-endian and little-endian
-	platforms, so they should not be stored or transferred over the net.
-
-`void hashmap_init(struct hashmap *map, hashmap_cmp_fn equals_function, size_t initial_size)`::
-
-	Initializes a hashmap structure.
-+
-`map` is the hashmap to initialize.
-+
-The `equals_function` can be specified to compare two entries for equality.
-If NULL, entries are considered equal if their hash codes are equal.
-+
-If the total number of entries is known in advance, the `initial_size`
-parameter may be used to preallocate a sufficiently large table and thus
-prevent expensive resizing. If 0, the table is dynamically resized.
-
-`void hashmap_free(struct hashmap *map, int free_entries)`::
-
-	Frees a hashmap structure and allocated memory.
-+
-`map` is the hashmap to free.
-+
-If `free_entries` is true, each hashmap_entry in the map is freed as well
-(using stdlib's free()).
-
-`void hashmap_entry_init(void *entry, unsigned int hash)`::
-
-	Initializes a hashmap_entry structure.
-+
-`entry` points to the entry to initialize.
-+
-`hash` is the hash code of the entry.
-+
-The hashmap_entry structure does not hold references to external resources,
-and it is safe to just discard it once you are done with it (i.e. if
-your structure was allocated with xmalloc(), you can just free(3) it,
-and if it is on stack, you can just let it go out of scope).
-
-`void *hashmap_get(const struct hashmap *map, const void *key, const void *keydata)`::
-
-	Returns the hashmap entry for the specified key, or NULL if not found.
-+
-`map` is the hashmap structure.
-+
-`key` is a hashmap_entry structure (or user data structure that starts with
-hashmap_entry) that has at least been initialized with the proper hash code
-(via `hashmap_entry_init`).
-+
-If an entry with matching hash code is found, `key` and `keydata` are passed
-to `hashmap_cmp_fn` to decide whether the entry matches the key.
-
-`void *hashmap_get_from_hash(const struct hashmap *map, unsigned int hash, const void *keydata)`::
-
-	Returns the hashmap entry for the specified hash code and key data,
-	or NULL if not found.
-+
-`map` is the hashmap structure.
-+
-`hash` is the hash code of the entry to look up.
-+
-If an entry with matching hash code is found, `keydata` is passed to
-`hashmap_cmp_fn` to decide whether the entry matches the key. The
-`entry_or_key` parameter points to a bogus hashmap_entry structure that
-should not be used in the comparison.
-
-`void *hashmap_get_next(const struct hashmap *map, const void *entry)`::
-
-	Returns the next equal hashmap entry, or NULL if not found. This can be
-	used to iterate over duplicate entries (see `hashmap_add`).
-+
-`map` is the hashmap structure.
-+
-`entry` is the hashmap_entry to start the search from, obtained via a previous
-call to `hashmap_get` or `hashmap_get_next`.
-
-`void hashmap_add(struct hashmap *map, void *entry)`::
-
-	Adds a hashmap entry. This allows to add duplicate entries (i.e.
-	separate values with the same key according to hashmap_cmp_fn).
-+
-`map` is the hashmap structure.
-+
-`entry` is the entry to add.
-
-`void *hashmap_put(struct hashmap *map, void *entry)`::
-
-	Adds or replaces a hashmap entry. If the hashmap contains duplicate
-	entries equal to the specified entry, only one of them will be replaced.
-+
-`map` is the hashmap structure.
-+
-`entry` is the entry to add or replace.
-+
-Returns the replaced entry, or NULL if not found (i.e. the entry was added).
-
-`void *hashmap_remove(struct hashmap *map, const void *key, const void *keydata)`::
-
-	Removes a hashmap entry matching the specified key. If the hashmap
-	contains duplicate entries equal to the specified key, only one of
-	them will be removed.
-+
-`map` is the hashmap structure.
-+
-`key` is a hashmap_entry structure (or user data structure that starts with
-hashmap_entry) that has at least been initialized with the proper hash code
-(via `hashmap_entry_init`).
-+
-If an entry with matching hash code is found, `key` and `keydata` are
-passed to `hashmap_cmp_fn` to decide whether the entry matches the key.
-+
-Returns the removed entry, or NULL if not found.
-
-`void hashmap_disallow_rehash(struct hashmap *map, unsigned value)`::
-
-	Disallow/allow automatic rehashing of the hashmap during inserts
-	and deletes.
-+
-This is useful if the caller knows that the hashmap will be accessed
-by multiple threads.
-+
-The caller is still responsible for any necessary locking; this simply
-prevents unexpected rehashing.  The caller is also responsible for properly
-sizing the initial hashmap to ensure good performance.
-+
-A call to allow rehashing does not force a rehash; that might happen
-with the next insert or delete.
-
-`void hashmap_iter_init(struct hashmap *map, struct hashmap_iter *iter)`::
-`void *hashmap_iter_next(struct hashmap_iter *iter)`::
-`void *hashmap_iter_first(struct hashmap *map, struct hashmap_iter *iter)`::
-
-	Used to iterate over all entries of a hashmap. Note that it is
-	not safe to add or remove entries to the hashmap while
-	iterating.
-+
-`hashmap_iter_init` initializes a `hashmap_iter` structure.
-+
-`hashmap_iter_next` returns the next hashmap_entry, or NULL if there are no
-more entries.
-+
-`hashmap_iter_first` is a combination of both (i.e. initializes the iterator
-and returns the first entry, if any).
-
-`const char *strintern(const char *string)`::
-`const void *memintern(const void *data, size_t len)`::
-
-	Returns the unique, interned version of the specified string or data,
-	similar to the `String.intern` API in Java and .NET, respectively.
-	Interned strings remain valid for the entire lifetime of the process.
-+
-Can be used as `[x]strdup()` or `xmemdupz` replacement, except that interned
-strings / data must not be modified or freed.
-+
-Interned strings are best used for short strings with high probability of
-duplicates.
-+
-Uses a hashmap to store the pool of interned strings.
-
-Usage example
--------------
-
-Here's a simple usage example that maps long keys to double values.
-------------
-struct hashmap map;
-
-struct long2double {
-	struct hashmap_entry ent; /* must be the first member! */
-	long key;
-	double value;
-};
-
-static int long2double_cmp(const struct long2double *e1, const struct long2double *e2, const void *unused)
-{
-	return !(e1->key == e2->key);
-}
-
-void long2double_init(void)
-{
-	hashmap_init(&map, (hashmap_cmp_fn) long2double_cmp, 0);
-}
-
-void long2double_free(void)
-{
-	hashmap_free(&map, 1);
-}
-
-static struct long2double *find_entry(long key)
-{
-	struct long2double k;
-	hashmap_entry_init(&k, memhash(&key, sizeof(long)));
-	k.key = key;
-	return hashmap_get(&map, &k, NULL);
-}
-
-double get_value(long key)
-{
-	struct long2double *e = find_entry(key);
-	return e ? e->value : 0;
-}
-
-void set_value(long key, double value)
-{
-	struct long2double *e = find_entry(key);
-	if (!e) {
-		e = malloc(sizeof(struct long2double));
-		hashmap_entry_init(e, memhash(&key, sizeof(long)));
-		e->key = key;
-		hashmap_add(&map, e);
-	}
-	e->value = value;
-}
-------------
-
-Using variable-sized keys
--------------------------
-
-The `hashmap_entry_get` and `hashmap_entry_remove` functions expect an ordinary
-`hashmap_entry` structure as key to find the correct entry. If the key data is
-variable-sized (e.g. a FLEX_ARRAY string) or quite large, it is undesirable
-to create a full-fledged entry structure on the heap and copy all the key data
-into the structure.
-
-In this case, the `keydata` parameter can be used to pass
-variable-sized key data directly to the comparison function, and the `key`
-parameter can be a stripped-down, fixed size entry structure allocated on the
-stack.
-
-See test-hashmap.c for an example using arbitrary-length strings as keys.
diff --git a/hashmap.h b/hashmap.h
index 1c26bbad5b..cd8c6f8ad3 100644
--- a/hashmap.h
+++ b/hashmap.h
@@ -3,11 +3,18 @@
 
 /*
  * Generic implementation of hash-based key-value mappings.
- * See Documentation/technical/api-hashmap.txt.
+ *
  */
 
-/* FNV-1 functions */
-
+/*
+ * Ready-to-use hash functions for strings, using the FNV-1 algorithm (see
+ * http://www.isthe.com/chongo/tech/comp/fnv).
+ * `strhash` and `strihash` take 0-terminated strings, while `memhash` and
+ * `memihash` operate on arbitrary-length memory.
+ * `strihash` and `memihash` are case insensitive versions.
+ * `memihash_cont` is a variant of `memihash` that allows a computation to be
+ * continued with another chunk of data.
+ */
 extern unsigned int strhash(const char *buf);
 extern unsigned int strihash(const char *buf);
 extern unsigned int memhash(const void *buf, size_t len);
@@ -16,6 +23,15 @@ extern unsigned int memihash_cont(unsigned int hash_seed, const void *buf, size_
 
 static inline unsigned int sha1hash(const unsigned char *sha1)
 {
+	/*
+	 * Converts a cryptographic hash (e.g. SHA-1) into an int-sized hash code
+	 * for use in hash tables. Cryptographic hashes are supposed to have
+	 * uniform distribution, so in contrast to `memhash()`, this just copies
+	 * the first `sizeof(int)` bytes without shuffling any bits. Note that
+	 * the results will be different on big-endian and little-endian
+	 * platforms, so they should not be stored or transferred over the net.
+	 */
+
 	/*
 	 * Equivalent to 'return *(unsigned int *)sha1;', but safe on
 	 * platforms that don't support unaligned reads.
@@ -25,62 +41,228 @@ static inline unsigned int sha1hash(const unsigned char *sha1)
 	return hash;
 }
 
-/* data structures */
-
+/*
+ * struct hashmap_entry is an opaque structure representing an entry in the
+ * hash table, which must be used as first member of user data structures.
+ * Ideally it should be followed by an int-sized member to prevent unused
+ * memory on 64-bit systems due to alignment.
+ */
 struct hashmap_entry {
+	/*
+	 * next points to the next entry in case of collisions (i.e. if
+	 * multiple entries map to the same bucket)
+	 */
 	struct hashmap_entry *next;
+
+	/* entry's hash code */
 	unsigned int hash;
 };
 
+/*
+ * User-supplied function to test two hashmap entries for equality. Shall
+ * return 0 if the entries are equal.
+ *
+ * This function is always called with non-NULL `entry` and `entry_or_key`
+ * parameters that have the same hash code.
+ *
+ * When looking up an entry, the `key` and `keydata` parameters to hashmap_get
+ * and hashmap_remove are always passed
+ * as second `entry_or_key` and third argument `keydata`, respectively.
+ * Otherwise, `keydata` is NULL.
+ *
+ * There are two modes for this function to be used:
+ * (a) When looking for an exact match of a given `entry`, then `keydata`
+ *     ought to be NULL, and this function should cast `entry` and
+ *     `entry_or_key` to the user entries and check for equality.
+ * (b) When it is too expensive to allocate such a user entry (either because
+ *     it is large or varialbe sized, such that it is not on the stack),
+ *     Then the relevant data to check for equality should be passed via
+ *     ` keydata`.
+ *
+ * Resulting from these modes, this function should compare `keydata` to `entry`
+ * when `keydata` is not NULL. `entry_or_key` may be a bogus hashmap_entry (see
+ * hashmap_get_from_hash).
+ *
+ * When `keydata` is NULL,`entry_or_key` is guaranteed to be a valid user supplied
+ * entry, such that the comparision should be made between `entry` and
+ * `entry_or_key`, both cast to the user type.
+ *
+ * The `data` entry is the pointer given in the init function and is always
+ * given.
+ */
 typedef int (*hashmap_cmp_fn)(const void *entry, const void *entry_or_key,
-		const void *keydata, const void *cbdata);
+		const void *keydata, const void *data);
 
+/*
+ * struct hashmap is the hash table structure. Members can be used as follows,
+ * but should not be modified directly.
+ */
 struct hashmap {
 	struct hashmap_entry **table;
+
+	/* Stores the comparison function specified in `hashmap_init()`. */
 	hashmap_cmp_fn cmpfn;
 	const void *data;
-	unsigned int size, tablesize, grow_at, shrink_at;
-	unsigned disallow_rehash : 1;
-};
 
-struct hashmap_iter {
-	struct hashmap *map;
-	struct hashmap_entry *next;
-	unsigned int tablepos;
+	/* total number of entries (0 means the hashmap is empty) */
+	unsigned int size;
+
+	/*
+	 * tablesize is the allocated size of the hash table. A non-0 value
+	 * indicates that the hashmap is initialized. It may also be useful
+	 * for statistical purposes (i.e. `size / tablesize` is the current
+	 * load factor).
+	 */
+	unsigned int tablesize;
+
+	unsigned int grow_at;
+	unsigned int shrink_at;
+
+	/* Prevent automatic rehashes during inserts and deletes when set. */
+	unsigned disallow_rehash : 1;
 };
 
 /* hashmap functions */
 
+/*
+ * Initializes a hashmap structure.
+ *
+ * `map` is the hashmap to initialize.
+ *
+ * The `equals_function` can be specified to compare two entries for equality.
+ * If NULL, entries are considered equal if their hash codes are equal.
+ *
+ * When the compare function needs specific user supplied `data`, it should
+ * be given in `data`. It will be passed to the compare function in each call.
+ *
+ * If the total number of entries is known in advance, the `initial_size`
+ * parameter may be used to preallocate a sufficiently large table and thus
+ * prevent expensive resizing. If 0, the table is dynamically resized.
+ */
 extern void hashmap_init(struct hashmap *map,
 			 hashmap_cmp_fn equals_function,
 			 const void *data,
 			 size_t initial_size);
+
+/*
+ * Frees a hashmap structure and allocated memory.
+ *
+ * If `free_entries` is true, each hashmap_entry in the map is freed as well.
+ */
 extern void hashmap_free(struct hashmap *map, int free_entries);
 
 /* hashmap_entry functions */
 
+/*
+ * Initializes a hashmap_entry structure.
+ *
+ * `entry` points to the entry to initialize.
+ * `hash` is the hash code of the entry.
+ *
+ * The hashmap_entry structure does not hold references to external resources,
+ * and it is safe to just discard it once you are done with it (i.e. if
+ * your structure was allocated with xmalloc(), you can just free(3) it,
+ * and if it is on stack, you can just let it go out of scope).
+ */
 static inline void hashmap_entry_init(void *entry, unsigned int hash)
 {
 	struct hashmap_entry *e = entry;
 	e->hash = hash;
 	e->next = NULL;
 }
+
+/*
+ * Returns the hashmap entry for the specified key, or NULL if not found.
+ *
+ * `map` is the hashmap structure.
+ * `key` is a user data structure that starts with hashmap_entry that has at
+ *  least been initialized with the proper hash code (via `hashmap_entry_init`).
+ * `keydata` is a data structure that holds just enough information to check
+ * for equality to a given entry.
+ *
+ * One of `key` or `keydata` shall be NULL.
+ *
+ * If an entry with matching hash code is found, `key` and `keydata` are passed
+ * to `hashmap_cmp_fn` to decide whether the entry matches the key.
+ */
 extern void *hashmap_get(const struct hashmap *map, const void *key,
-		const void *keydata);
+			 const void *keydata);
+
+/*
+ * Returns the next equal hashmap entry, or NULL if not found. This can be
+ * used to iterate over duplicate entries (see `hashmap_add`).
+ *
+ * `map` is the hashmap structure.
+ * `entry` is the hashmap_entry to start the search from, obtained via a previous
+ * call to `hashmap_get` or `hashmap_get_next`.
+ */
 extern void *hashmap_get_next(const struct hashmap *map, const void *entry);
+
+/*
+ * Adds a hashmap entry. This allows to add duplicate entries (i.e.
+ * separate values with the same key according to hashmap_cmp_fn).
+ *
+ * `map` is the hashmap structure.
+ * `entry` is the entry to add.
+ */
 extern void hashmap_add(struct hashmap *map, void *entry);
+
+/*
+ * Adds or replaces a hashmap entry. If the hashmap contains duplicate
+ * entries equal to the specified entry, only one of them will be replaced.
+ *
+ * `map` is the hashmap structure.
+ * `entry` is the entry to add or replace.
+ * Returns the replaced entry, or NULL if not found (i.e. the entry was added).
+ */
 extern void *hashmap_put(struct hashmap *map, void *entry);
+
+/*
+ * Removes a hashmap entry matching the specified key. If the hashmap
+ * contains duplicate entries equal to the specified key, only one of
+ * them will be removed.
+ *
+ * `map` is the hashmap structure.
+ * `key` is a user data structure that starts with hashmap_entry that has at
+ *  least been initialized with the proper hash code (via `hashmap_entry_init`).
+ * `keydata` is a data structure that holds just enough information to check
+ * for equality to a given entry.
+ *
+ * One of `key` or `keydata` shall be NULL.
+ *
+ * If an entry with matching hash code is found, `key` and `keydata` are
+ * passed to `hashmap_cmp_fn` to decide whether the entry matches the key.
+ *
+ * Returns the removed entry, or NULL if not found.
+ */
 extern void *hashmap_remove(struct hashmap *map, const void *key,
 		const void *keydata);
 
+/*
+ * Returns the hashmap entry for the specified hash code and key data,
+ * or NULL if not found.
+ *
+ * `map` is the hashmap structure.
+ * `hash` is the hash code of the entry to look up.
+ *
+ * If an entry with matching hash code is found, `keydata` is passed to
+ * `hashmap_cmp_fn` to decide whether the entry matches the key. The
+ * `entry_or_key` parameter points to a bogus hashmap_entry structure that
+ * should not be used in the comparison.
+ */
 static inline void *hashmap_get_from_hash(const struct hashmap *map,
-		unsigned int hash, const void *keydata)
+					  unsigned int hash,
+					  const void *keydata)
 {
 	struct hashmap_entry key;
 	hashmap_entry_init(&key, hash);
 	return hashmap_get(map, &key, keydata);
 }
 
+/*
+ * Returns the `bucket` an entry is stored in.
+ * Useful for multithreaded read access.
+ */
 int hashmap_bucket(const struct hashmap *map, unsigned int hash);
 
 /*
@@ -91,7 +273,7 @@ int hashmap_bucket(const struct hashmap *map, unsigned int hash);
  * manner appropriate to their usage.  This simply
  * prevents the table from being unexpectedly re-mapped.
  *
- * If is up to the caller to ensure that the hashmap is
+ * It is up to the caller to ensure that the hashmap is
  * initialized to a reasonable size to prevent poor
  * performance.
  *
@@ -104,10 +286,28 @@ static inline void hashmap_disallow_rehash(struct hashmap *map, unsigned value)
 	map->disallow_rehash = value;
 }
 
-/* hashmap_iter functions */
 
+/*
+ * hashmap_iter functions
+ *
+ * Used to iterate over all entries of a hashmap. Note that it is
+ * not safe to add or remove entries to the hashmap while
+ * iterating.
+ */
+
+struct hashmap_iter {
+	struct hashmap *map;
+	struct hashmap_entry *next;
+	unsigned int tablepos;
+};
+
+/* Initializes a `hashmap_iter` structure. */
 extern void hashmap_iter_init(struct hashmap *map, struct hashmap_iter *iter);
+
+/* Returns the next hashmap_entry, or NULL if there are no more entries. */
 extern void *hashmap_iter_next(struct hashmap_iter *iter);
+
+/* Initializes the iterator and returns the first entry, if any). */
 static inline void *hashmap_iter_first(struct hashmap *map,
 		struct hashmap_iter *iter)
 {
@@ -115,8 +315,21 @@ static inline void *hashmap_iter_first(struct hashmap *map,
 	return hashmap_iter_next(iter);
 }
 
-/* string interning */
+/* String interning */
 
+/*
+ * Returns the unique, interned version of the specified string or data,
+ * similar to the `String.intern` API in Java and .NET, respectively.
+ * Interned strings remain valid for the entire lifetime of the process.
+ *
+ * Can be used as `[x]strdup()` or `xmemdupz` replacement, except that interned
+ * strings / data must not be modified or freed.
+ *
+ * Interned strings are best used for short strings with high probability of
+ * duplicates.
+ *
+ * Uses a hashmap to store the pool of interned strings.
+ */
 extern const void *memintern(const void *data, size_t len);
 static inline const char *strintern(const char *string)
 {
-- 
2.13.0.31.g9b732c453e


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] hashmap: migrate documentation from Documentation/technical into header
  2017-06-29  1:13 ` [PATCH 2/2] hashmap: migrate documentation from Documentation/technical into header Stefan Beller
@ 2017-06-29 13:25   ` Jeff Hostetler
  2017-06-29 21:18   ` Jonathan Nieder
  1 sibling, 0 replies; 26+ messages in thread
From: Jeff Hostetler @ 2017-06-29 13:25 UTC (permalink / raw)
  To: Stefan Beller, gitster; +Cc: git



On 6/28/2017 9:13 PM, Stefan Beller wrote:
> While at it, clarify the use of `key`, `keydata`, `entry_or_key` as well
> as documenting the new data pointer for the compare function.
> 
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>   Documentation/technical/api-hashmap.txt | 309 --------------------------------
>   hashmap.h                               | 249 +++++++++++++++++++++++--
>   2 files changed, 231 insertions(+), 327 deletions(-)
>   delete mode 100644 Documentation/technical/api-hashmap.txt
 > ...

This looks very nice. Thank you!

Jeff

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] hashmap.h: compare function has access to a data field
  2017-06-29  1:13 ` [PATCH 1/2] hashmap.h: compare function has access to a data field Stefan Beller
@ 2017-06-29 18:06   ` Junio C Hamano
  2017-06-29 18:11     ` Junio C Hamano
  0 siblings, 1 reply; 26+ messages in thread
From: Junio C Hamano @ 2017-06-29 18:06 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

Stefan Beller <sbeller@google.com> writes:

> When using the hashmap a common need is to have access to arbitrary data
> in the compare function. A couple of times we abuse the keydata field
> to pass in the data needed. This happens for example in patch-ids.c.

It is not "arbitrary data"; it is very important to streess that we
are not just passing random crud, but adding a mechansim to
tailor/curry the function in a way that is fixed throughout the
lifetime of a hashmap.

I haven't looked at the use of keydata in patch-ids.c and friends to
decide if that "abuse" claim is correct; if it were the case, should
we expect that a follow-up patch to clean up the existing mess by
using the new mechanism?  Or does fixing the "abuse" take another
mechanism that is different from this one?

> While at it improve the naming of the fields of all compare functions used
> by hashmaps by ensuring unused parameters are prefixed with 'unused_' and
> naming the parameters what they are (instead of 'unused' make it
> 'unused_keydata').
>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---

> diff --git a/hashmap.h b/hashmap.h
> index de6022a3a9..1c26bbad5b 100644
> --- a/hashmap.h
> +++ b/hashmap.h
> @@ -33,11 +33,12 @@ struct hashmap_entry {
>  };
>  
>  typedef int (*hashmap_cmp_fn)(const void *entry, const void *entry_or_key,
> -		const void *keydata);
> +		const void *keydata, const void *cbdata);

As I view the new "data" thing the C's (read: poor-man's) way to
cutomize the function, I would have tweaked the function signature
by giving the customization data at the front, i.e.

	fn(fndata, entry, entry_or_key, keydata);

as I think the way we wish to be able to express it in
a better language would have been something like:

	(partial_apply(fn, fndata))(entry, entry_or_key, keydata)

But what you did is OK, too.

> +extern void hashmap_init(struct hashmap *map,
> +			 hashmap_cmp_fn equals_function,
> +			 const void *data,
> +			 size_t initial_size);

And this does match my expectation ;-).

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] hashmap.h: compare function has access to a data field
  2017-06-29 18:06   ` Junio C Hamano
@ 2017-06-29 18:11     ` Junio C Hamano
  2017-06-29 18:20       ` Stefan Beller
  0 siblings, 1 reply; 26+ messages in thread
From: Junio C Hamano @ 2017-06-29 18:11 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

Junio C Hamano <gitster@pobox.com> writes:

> I haven't looked at the use of keydata in patch-ids.c and friends to
> decide if that "abuse" claim is correct; if it were the case, should
> we expect that a follow-up patch to clean up the existing mess by
> using the new mechanism?  Or does fixing the "abuse" take another
> mechanism that is different from this one?

I see that you corrected patch-ids.c "while at it".  That may make
it harder to revert only that "while at it", I suspect.

Thanks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] hashmap.h: compare function has access to a data field
  2017-06-29 18:11     ` Junio C Hamano
@ 2017-06-29 18:20       ` Stefan Beller
  2017-06-30 17:26         ` Junio C Hamano
  0 siblings, 1 reply; 26+ messages in thread
From: Stefan Beller @ 2017-06-29 18:20 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git@vger.kernel.org

On Thu, Jun 29, 2017 at 11:11 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Junio C Hamano <gitster@pobox.com> writes:
>
>> I haven't looked at the use of keydata in patch-ids.c and friends to
>> decide if that "abuse" claim is correct; if it were the case, should
>> we expect that a follow-up patch to clean up the existing mess by
>> using the new mechanism?  Or does fixing the "abuse" take another
>> mechanism that is different from this one?
>
> I see that you corrected patch-ids.c "while at it".  That may make
> it harder to revert only that "while at it", I suspect.
>
> Thanks.

Yes it was a last minute squash before sending it out, as the fix was only
two lines whereas the conversion is a lot. If it were separated I could have
claimed the introduction to be a rather mechanical patch, but I did not
make use of coccinelle or such, so the likelihood for errors is just as high.

So I decided to squash them.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] hashmap: migrate documentation from Documentation/technical into header
  2017-06-29  1:13 ` [PATCH 2/2] hashmap: migrate documentation from Documentation/technical into header Stefan Beller
  2017-06-29 13:25   ` Jeff Hostetler
@ 2017-06-29 21:18   ` Jonathan Nieder
  2017-06-29 23:41     ` Stefan Beller
  1 sibling, 1 reply; 26+ messages in thread
From: Jonathan Nieder @ 2017-06-29 21:18 UTC (permalink / raw)
  To: Stefan Beller; +Cc: gitster, git

Stefan Beller wrote:

> Subject: hashmap: migrate documentation from Documentation/technical into header
>
> While at it, clarify the use of `key`, `keydata`, `entry_or_key` as well
> as documenting the new data pointer for the compare function.
>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>  Documentation/technical/api-hashmap.txt | 309 --------------------------------
>  hashmap.h                               | 249 +++++++++++++++++++++++--
>  2 files changed, 231 insertions(+), 327 deletions(-)
>  delete mode 100644 Documentation/technical/api-hashmap.txt

Yay, I love this.

[...]
> --- a/Documentation/technical/api-hashmap.txt
> +++ /dev/null
> @@ -1,309 +0,0 @@

Verified that these docs have all been migrated to the header, except
where noted below.

[...]
> -`int (*hashmap_cmp_fn)(const void *entry, const void *entry_or_key, const void *keydata)`::
> -
> -	User-supplied function to test two hashmap entries for equality. Shall
> -	return 0 if the entries are equal.
> -+
> -This function is always called with non-NULL `entry` / `entry_or_key`
> -parameters that have the same hash code. When looking up an entry, the `key`
> -and `keydata` parameters to hashmap_get and hashmap_remove are always passed
> -as second and third argument, respectively. Otherwise, `keydata` is NULL.

This was really heard to read in the preimage.  Thanks for cleaning it up.

[...]
> -Usage example
> --------------
> -
> -Here's a simple usage example that maps long keys to double values.

What happened to this?  I think it would be useful to include in a
long comment at the top of the header.  A worked example like this
makes it easier to get one's bearings and know which functions to look
at.

[...]
> -Using variable-sized keys
> --------------------------

Same question: should this discussion go in the hashmap_get()
docstring, with a pointer from hashmap_remove()?  Or should it go in
test-hashmap.c, with a pointer in the docstrings of both?

[...]
> +++ b/hashmap.h
> @@ -3,11 +3,18 @@
>  
>  /*
>   * Generic implementation of hash-based key-value mappings.
> - * See Documentation/technical/api-hashmap.txt.
> + *
>   */

nit: why the blank line?

>  
> -/* FNV-1 functions */
> -
> +/*
> + * Ready-to-use hash functions for strings, using the FNV-1 algorithm (see
> + * http://www.isthe.com/chongo/tech/comp/fnv).
> + * `strhash` and `strihash` take 0-terminated strings, while `memhash` and
> + * `memihash` operate on arbitrary-length memory.
> + * `strihash` and `memihash` are case insensitive versions.
> + * `memihash_cont` is a variant of `memihash` that allows a computation to be
> + * continued with another chunk of data.
> + */
>  extern unsigned int strhash(const char *buf);
>  extern unsigned int strihash(const char *buf);
>  extern unsigned int memhash(const void *buf, size_t len);
> @@ -16,6 +23,15 @@ extern unsigned int memihash_cont(unsigned int hash_seed, const void *buf, size_
>  
>  static inline unsigned int sha1hash(const unsigned char *sha1)
>  {
> +	/*
> +	 * Converts a cryptographic hash (e.g. SHA-1) into an int-sized hash code
> +	 * for use in hash tables. Cryptographic hashes are supposed to have
> +	 * uniform distribution, so in contrast to `memhash()`, this just copies
> +	 * the first `sizeof(int)` bytes without shuffling any bits. Note that
> +	 * the results will be different on big-endian and little-endian
> +	 * platforms, so they should not be stored or transferred over the net.
> +	 */

This comment should go in front of the function instead of in its body, like this:

	/*
	 * Converts a crypto [...]
	 */
	static inline unsigned int sha1hash(...

because it is describing the purpose and usage of the function and not
its implementation.

[...]
> +/*
> + * User-supplied function to test two hashmap entries for equality. Shall
> + * return 0 if the entries are equal.
> + *
> + * This function is always called with non-NULL `entry` and `entry_or_key`
> + * parameters that have the same hash code.
> + *
> + * When looking up an entry, the `key` and `keydata` parameters to hashmap_get
> + * and hashmap_remove are always passed
> + * as second `entry_or_key` and third argument `keydata`, respectively.
> + * Otherwise, `keydata` is NULL.

strange wrapping

> + *
> + * There are two modes for this function to be used:
> + * (a) When looking for an exact match of a given `entry`, then `keydata`
> + *     ought to be NULL, and this function should cast `entry` and
> + *     `entry_or_key` to the user entries and check for equality.

What does it mean that `keydata` ought to be NULL?  Are you saying it
will be NULL, or that someone has to ensure it's NULL, or something
else?

> + * (b) When it is too expensive to allocate such a user entry (either because
> + *     it is large or varialbe sized, such that it is not on the stack),
> + *     Then the relevant data to check for equality should be passed via
> + *     ` keydata`.

stray space in "` keydata`".

> + *
> + * Resulting from these modes, this function should compare `keydata` to `entry`
> + * when `keydata` is not NULL. `entry_or_key` may be a bogus hashmap_entry (see
> + * hashmap_get_from_hash).

What is meant by a bogus hashmap_entry here?  (See also below.)

[...]
> +/*
> + * struct hashmap is the hash table structure. Members can be used as follows,
> + * but should not be modified directly.
> + */
>  struct hashmap {
[...]
> +	/* Prevent automatic rehashes during inserts and deletes when set. */
> +	unsigned disallow_rehash : 1;
>  };

Please add a reference to hashmap_disallow_rehash so people know how
to set this.

>  
>  /* hashmap functions */
>  
> +/*
> + * Initializes a hashmap structure.
> + *
> + * `map` is the hashmap to initialize.
> + *
> + * The `equals_function` can be specified to compare two entries for equality.
> + * If NULL, entries are considered equal if their hash codes are equal.
> + *
> + * When the compare function needs specific user supplied `data`, it should

nit: it's not user supplied so much as caller supplied.  Maybe this
can say something like

	The `data` parameter can be used to provide additional data (a callback
	cookie) that will be passed to `equals_function` each time it is called.
	This allows a single `equals_function` to implement multiple comparison
	functions.

[...]
> +/*
> + * Frees a hashmap structure and allocated memory.
> + *
> + * If `free_entries` is true, each hashmap_entry in the map is freed as well.
> + */

nit: Documentation/technical said the hashmap_entrys get freed using
free().  Can't hurt to carry that note over, to save the reader from
hunting down whether the struct hashmap allows setting a custom free
function.

[...]
> +/*
> + * Returns the hashmap entry for the specified key, or NULL if not found.
> + *
> + * `map` is the hashmap structure.
> + * `key` is a user data structure that starts with hashmap_entry that has at
> + *  least been initialized with the proper hash code (via `hashmap_entry_init`).

nit: strange alignment (extra space before "least").

Should these argument lists be formatted as a list somehow (e.g. leading
"- " bullets, or with a blank line between them?  (I don't have a
strong opinion, just noticed they can be hard to read with everything
run together.)

> + * `keydata` is a data structure that holds just enough information to check
> + * for equality to a given entry.
> + *
> + * One of `key` or `keydata` shall be NULL.
> + *
> + * If an entry with matching hash code is found, `key` and `keydata` are passed
> + * to `hashmap_cmp_fn` to decide whether the entry matches the key.
> + */
>  extern void *hashmap_get(const struct hashmap *map, const void *key,
[...]
> + * Removes a hashmap entry matching the specified key. If the hashmap
> + * contains duplicate entries equal to the specified key, only one of
> + * them will be removed.
> + *
> + * `map` is the hashmap structure.
> + * `key` is a user data structure that starts with hashmap_entry that has at
> + *  least been initialized with the proper hash code (via `hashmap_entry_init`).

nit: strange alignment again

[...]
>  extern void *hashmap_remove(struct hashmap *map, const void *key,
[...]
> +/*
> + * Returns the hashmap entry for the specified hash code and key data,
> + * or NULL if not found.
> + *
> + * `map` is the hashmap structure.
> + * `hash` is the hash code of the entry to look up.
> + *
> + * If an entry with matching hash code is found, `keydata` is passed to
> + * `hashmap_cmp_fn` to decide whether the entry matches the key. The
> + * `entry_or_key` parameter points to a bogus hashmap_entry structure that
> + * should not be used in the comparison.

What is going on with this note about bogus entry_or_key?  I have no
idea what it is trying to say, neither in the preimage nor the
postimage.

> + */
>  static inline void *hashmap_get_from_hash(const struct hashmap *map,
> -		unsigned int hash, const void *keydata)
> +					  unsigned int hash,
> +					  const void *keydata)

In Documentation/technical, hashmap_get_from_hash was immediately
after hashmap_get, making it more discoverable.

Should the descriptions of these two functions mention each other to
bring about a similar effect?

[...]
> @@ -91,7 +273,7 @@ int hashmap_bucket(const struct hashmap *map, unsigned int hash);
>   * manner appropriate to their usage.  This simply
>   * prevents the table from being unexpectedly re-mapped.
>   *
> - * If is up to the caller to ensure that the hashmap is
> + * It is up to the caller to ensure that the hashmap is
>   * initialized to a reasonable size to prevent poor
>   * performance.
>   *

The Documentation/technical/ text says:

	A call to allow rehashing does not force a rehash; that might happen
	with the next insert or delete.

which I find easier on the eyes than

	 * When value=0, allow future rehahses.  This DOES NOT force
	 * a rehash now.

Why not make this comment use the full description from
Documentation/technical/ verbatim?

> @@ -104,10 +286,28 @@ static inline void hashmap_disallow_rehash(struct hashmap *map, unsigned value)
[...]
> -/* hashmap_iter functions */
>  
> +/*
> + * hashmap_iter functions
> + *
> + * Used to iterate over all entries of a hashmap. Note that it is
> + * not safe to add or remove entries to the hashmap while
> + * iterating.
> + */
> +
> +struct hashmap_iter {

nit: might be simpler to associate this comment with the struct:

	/*
	 * Used to iterate over ...
	 */
	struct hashmap_iter {
		...
	}

That way, the scope of what the comment is describing is clearer.
When looking up the functions, I'm likely to look up the hashmap_iter
type they use, so I'd end up reading this comment anyway.

> +	struct hashmap *map;
> +	struct hashmap_entry *next;
> +	unsigned int tablepos;
> +};
> +
> +/* Initializes a `hashmap_iter` structure. */
>  extern void hashmap_iter_init(struct hashmap *map, struct hashmap_iter *iter);
> +
> +/* Returns the next hashmap_entry, or NULL if there are no more entries. */
>  extern void *hashmap_iter_next(struct hashmap_iter *iter);
> +
> +/* Initializes the iterator and returns the first entry, if any). */

Stray paren?

Probably worth mentioning this is a convenience wrapper for iter_init
+ iter_next, like the Documentation/technical/ text did.

>  static inline void *hashmap_iter_first(struct hashmap *map,

The rest looks good.

Thanks and hope that helps,
Jonathan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] hashmap: migrate documentation from Documentation/technical into header
  2017-06-29 21:18   ` Jonathan Nieder
@ 2017-06-29 23:41     ` Stefan Beller
  0 siblings, 0 replies; 26+ messages in thread
From: Stefan Beller @ 2017-06-29 23:41 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Junio C Hamano, git@vger.kernel.org

Took all suggestions so far, but the last one:

>
> Probably worth mentioning this is a convenience wrapper for iter_init
> + iter_next, like the Documentation/technical/ text did.

No. The code to observe that this is a convenience wrapper
IS RIGHT THERE (it's a header file), and I find this header a
bit bloated already. So I'll keep it as a one liner.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCHv2 0/2] Introduce data field in hashmap and migrate docs to header
  2017-06-29  1:13 [PATCH 0/2] Introduce data field in hashmap and migrate docs to header Stefan Beller
  2017-06-29  1:13 ` [PATCH 1/2] hashmap.h: compare function has access to a data field Stefan Beller
  2017-06-29  1:13 ` [PATCH 2/2] hashmap: migrate documentation from Documentation/technical into header Stefan Beller
@ 2017-06-29 23:53 ` Stefan Beller
  2017-06-29 23:53   ` [PATCHv2 1/2] hashmap.h: compare function has access to a data field Stefan Beller
                     ` (2 more replies)
  2 siblings, 3 replies; 26+ messages in thread
From: Stefan Beller @ 2017-06-29 23:53 UTC (permalink / raw)
  To: sbeller; +Cc: git, gitster, jrnieder, git

v2:

addressed all but the last point of Jonathan Nieder.

Thanks,
Stefan

v1:

https://public-inbox.org/git/xmqqpodnvmmw.fsf@gitster.mtv.corp.google.com/
for context why we need a new data field.  Implement that.

Once upon a time we had a long discussion where to put documentation best.
The answer was header files as there documentation has less chance
to become stale and be out of date.  Improve the docs by
* migrating them to the header
* clarifying how the compare function is to be used
* how the arguments to hashmap_get/remove should be used.

Thanks,
Stefan

Stefan Beller (2):
  hashmap.h: compare function has access to a data field
  hashmap: migrate documentation from Documentation/technical into
    header

 Documentation/technical/api-hashmap.txt | 309 ---------------------------
 attr.c                                  |   4 +-
 builtin/describe.c                      |   6 +-
 builtin/difftool.c                      |  20 +-
 builtin/fast-export.c                   |   5 +-
 config.c                                |   7 +-
 convert.c                               |   3 +-
 diffcore-rename.c                       |   2 +-
 hashmap.c                               |  17 +-
 hashmap.h                               | 358 ++++++++++++++++++++++++++++----
 name-hash.c                             |  12 +-
 oidset.c                                |   5 +-
 patch-ids.c                             |   6 +-
 refs.c                                  |   4 +-
 remote.c                                |   7 +-
 sha1_file.c                             |   5 +-
 sub-process.c                           |   5 +-
 sub-process.h                           |   6 +-
 submodule-config.c                      |  10 +-
 t/helper/test-hashmap.c                 |  15 +-
 20 files changed, 412 insertions(+), 394 deletions(-)
 delete mode 100644 Documentation/technical/api-hashmap.txt

-- 
2.13.0.31.g9b732c453e


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCHv2 1/2] hashmap.h: compare function has access to a data field
  2017-06-29 23:53 ` [PATCHv2 0/2] Introduce data field in hashmap and migrate docs to header Stefan Beller
@ 2017-06-29 23:53   ` Stefan Beller
  2017-06-30 17:34     ` Junio C Hamano
  2017-06-30 17:39     ` Junio C Hamano
  2017-06-29 23:53   ` [PATCHv2 2/2] hashmap: migrate documentation from Documentation/technical into header Stefan Beller
  2017-06-30 19:14   ` [PATCHv3 0/3] Introduce data field in hashmap and migrate docs to header Stefan Beller
  2 siblings, 2 replies; 26+ messages in thread
From: Stefan Beller @ 2017-06-29 23:53 UTC (permalink / raw)
  To: sbeller; +Cc: git, gitster, jrnieder, git

When using the hashmap a common need is to have access to arbitrary data
in the compare function. A couple of times we abuse the keydata field
to pass in the data needed. This happens for example in patch-ids.c.

This patch changes the function signature of the compare function
to have one more void pointer available. The pointer given for each
invocation of the compare function must be defined in the init function
of the hashmap and is just passed through.

Documentation of this new feature is deferred to a later patch.

While at it improve the naming of the fields of all compare functions used
by hashmaps by ensuring unused parameters are prefixed with 'unused_' and
naming the parameters what they are (instead of 'unused' make it
'unused_keydata').

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 attr.c                  |  4 ++--
 builtin/describe.c      |  6 ++++--
 builtin/difftool.c      | 20 ++++++++++++--------
 builtin/fast-export.c   |  5 +++--
 config.c                |  7 +++++--
 convert.c               |  3 ++-
 diffcore-rename.c       |  2 +-
 hashmap.c               | 17 ++++++++++++-----
 hashmap.h               |  9 ++++++---
 name-hash.c             | 12 ++++++++----
 oidset.c                |  5 +++--
 patch-ids.c             |  6 ++++--
 refs.c                  |  4 ++--
 remote.c                |  7 +++++--
 sha1_file.c             |  5 +++--
 sub-process.c           |  5 +++--
 sub-process.h           |  6 ++++--
 submodule-config.c      | 10 ++++++----
 t/helper/test-hashmap.c | 15 ++++++++++-----
 19 files changed, 95 insertions(+), 53 deletions(-)

diff --git a/attr.c b/attr.c
index 37454999d2..2f51417675 100644
--- a/attr.c
+++ b/attr.c
@@ -78,7 +78,7 @@ struct attr_hash_entry {
 /* attr_hashmap comparison function */
 static int attr_hash_entry_cmp(const struct attr_hash_entry *a,
 			       const struct attr_hash_entry *b,
-			       void *unused)
+			       void *unused_keydata, void *unused_data)
 {
 	return (a->keylen != b->keylen) || strncmp(a->key, b->key, a->keylen);
 }
@@ -86,7 +86,7 @@ static int attr_hash_entry_cmp(const struct attr_hash_entry *a,
 /* Initialize an 'attr_hashmap' object */
 static void attr_hashmap_init(struct attr_hashmap *map)
 {
-	hashmap_init(&map->map, (hashmap_cmp_fn) attr_hash_entry_cmp, 0);
+	hashmap_init(&map->map, (hashmap_cmp_fn) attr_hash_entry_cmp, NULL, 0);
 }
 
 /*
diff --git a/builtin/describe.c b/builtin/describe.c
index 70eb144608..a6c5a969a0 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -55,7 +55,9 @@ static const char *prio_names[] = {
 };
 
 static int commit_name_cmp(const struct commit_name *cn1,
-		const struct commit_name *cn2, const void *peeled)
+			   const struct commit_name *cn2,
+			   const void *peeled,
+			   const void *unused_data)
 {
 	return oidcmp(&cn1->peeled, peeled ? peeled : &cn2->peeled);
 }
@@ -501,7 +503,7 @@ int cmd_describe(int argc, const char **argv, const char *prefix)
 		return cmd_name_rev(args.argc, args.argv, prefix);
 	}
 
-	hashmap_init(&names, (hashmap_cmp_fn) commit_name_cmp, 0);
+	hashmap_init(&names, (hashmap_cmp_fn) commit_name_cmp, NULL, 0);
 	for_each_rawref(get_name, NULL);
 	if (!names.size && !always)
 		die(_("No names found, cannot describe anything."));
diff --git a/builtin/difftool.c b/builtin/difftool.c
index 9199227f6e..80786a95ab 100644
--- a/builtin/difftool.c
+++ b/builtin/difftool.c
@@ -131,7 +131,9 @@ struct working_tree_entry {
 };
 
 static int working_tree_entry_cmp(struct working_tree_entry *a,
-				  struct working_tree_entry *b, void *keydata)
+				  struct working_tree_entry *b,
+				  void *unused_keydata,
+				  void *unused_data)
 {
 	return strcmp(a->path, b->path);
 }
@@ -146,7 +148,8 @@ struct pair_entry {
 	const char path[FLEX_ARRAY];
 };
 
-static int pair_cmp(struct pair_entry *a, struct pair_entry *b, void *keydata)
+static int pair_cmp(struct pair_entry *a, struct pair_entry *b,
+		    void *unused_keydata, void *unused_data)
 {
 	return strcmp(a->path, b->path);
 }
@@ -174,7 +177,8 @@ struct path_entry {
 	char path[FLEX_ARRAY];
 };
 
-static int path_entry_cmp(struct path_entry *a, struct path_entry *b, void *key)
+static int path_entry_cmp(struct path_entry *a, struct path_entry *b,
+			  void *key, void *unused_data)
 {
 	return strcmp(a->path, key ? key : b->path);
 }
@@ -367,9 +371,9 @@ static int run_dir_diff(const char *extcmd, int symlinks, const char *prefix,
 	wtdir_len = wtdir.len;
 
 	hashmap_init(&working_tree_dups,
-		     (hashmap_cmp_fn)working_tree_entry_cmp, 0);
-	hashmap_init(&submodules, (hashmap_cmp_fn)pair_cmp, 0);
-	hashmap_init(&symlinks2, (hashmap_cmp_fn)pair_cmp, 0);
+		     (hashmap_cmp_fn)working_tree_entry_cmp, NULL, 0);
+	hashmap_init(&submodules, (hashmap_cmp_fn)pair_cmp, NULL, 0);
+	hashmap_init(&symlinks2, (hashmap_cmp_fn)pair_cmp, NULL, 0);
 
 	child.no_stdin = 1;
 	child.git_cmd = 1;
@@ -580,9 +584,9 @@ static int run_dir_diff(const char *extcmd, int symlinks, const char *prefix,
 	 * files through the symlink.
 	 */
 	hashmap_init(&wt_modified, (hashmap_cmp_fn)path_entry_cmp,
-		     wtindex.cache_nr);
+		     NULL, wtindex.cache_nr);
 	hashmap_init(&tmp_modified, (hashmap_cmp_fn)path_entry_cmp,
-		     wtindex.cache_nr);
+		     NULL, wtindex.cache_nr);
 
 	for (i = 0; i < wtindex.cache_nr; i++) {
 		struct hashmap_entry dummy;
diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 12d501bfde..fa389fe252 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -94,7 +94,8 @@ struct anonymized_entry {
 };
 
 static int anonymized_entry_cmp(const void *va, const void *vb,
-				const void *data)
+				const void *unused_keydata,
+				const void *unused_data)
 {
 	const struct anonymized_entry *a = va, *b = vb;
 	return a->orig_len != b->orig_len ||
@@ -113,7 +114,7 @@ static const void *anonymize_mem(struct hashmap *map,
 	struct anonymized_entry key, *ret;
 
 	if (!map->cmpfn)
-		hashmap_init(map, anonymized_entry_cmp, 0);
+		hashmap_init(map, anonymized_entry_cmp, NULL, 0);
 
 	hashmap_entry_init(&key, memhash(orig, *len));
 	key.orig = orig;
diff --git a/config.c b/config.c
index 1cd40a5fe6..aa0198a5fd 100644
--- a/config.c
+++ b/config.c
@@ -1754,14 +1754,17 @@ static int configset_add_value(struct config_set *cs, const char *key, const cha
 }
 
 static int config_set_element_cmp(const struct config_set_element *e1,
-				 const struct config_set_element *e2, const void *unused)
+				 const struct config_set_element *e2,
+				 const void *unused_keydata,
+				 const void *unused_data)
 {
 	return strcmp(e1->key, e2->key);
 }
 
 void git_configset_init(struct config_set *cs)
 {
-	hashmap_init(&cs->config_hash, (hashmap_cmp_fn)config_set_element_cmp, 0);
+	hashmap_init(&cs->config_hash, (hashmap_cmp_fn)config_set_element_cmp,
+		     NULL, 0);
 	cs->hash_initialized = 1;
 	cs->list.nr = 0;
 	cs->list.alloc = 0;
diff --git a/convert.c b/convert.c
index 7d2a519daf..deaf0ba7b3 100644
--- a/convert.c
+++ b/convert.c
@@ -583,7 +583,8 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len
 
 	if (!subprocess_map_initialized) {
 		subprocess_map_initialized = 1;
-		hashmap_init(&subprocess_map, (hashmap_cmp_fn) cmd2process_cmp, 0);
+		hashmap_init(&subprocess_map, (hashmap_cmp_fn) cmd2process_cmp,
+			     NULL, 0);
 		entry = NULL;
 	} else {
 		entry = (struct cmd2process *)subprocess_find_entry(&subprocess_map, cmd);
diff --git a/diffcore-rename.c b/diffcore-rename.c
index 1e4678b7d7..786f389498 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -341,7 +341,7 @@ static int find_exact_renames(struct diff_options *options)
 	/* Add all sources to the hash table in reverse order, because
 	 * later on they will be retrieved in LIFO order.
 	 */
-	hashmap_init(&file_table, NULL, rename_src_nr);
+	hashmap_init(&file_table, NULL, NULL, rename_src_nr);
 	for (i = rename_src_nr-1; i >= 0; i--)
 		insert_file_table(&file_table, i, rename_src[i].p->one);
 
diff --git a/hashmap.c b/hashmap.c
index 7d1044eb5d..82d70b329b 100644
--- a/hashmap.c
+++ b/hashmap.c
@@ -95,7 +95,9 @@ static inline int entry_equals(const struct hashmap *map,
 		const struct hashmap_entry *e1, const struct hashmap_entry *e2,
 		const void *keydata)
 {
-	return (e1 == e2) || (e1->hash == e2->hash && !map->cmpfn(e1, e2, keydata));
+	return (e1 == e2) ||
+	       (e1->hash == e2->hash &&
+		!map->cmpfn(e1, e2, keydata, map->data));
 }
 
 static inline unsigned int bucket(const struct hashmap *map,
@@ -140,19 +142,23 @@ static inline struct hashmap_entry **find_entry_ptr(const struct hashmap *map,
 	return e;
 }
 
-static int always_equal(const void *unused1, const void *unused2, const void *unused3)
+static int always_equal(const void *unused1,
+			const void *unused2,
+			const void *unused_keydata,
+			const void *unused_data)
 {
 	return 0;
 }
 
 void hashmap_init(struct hashmap *map, hashmap_cmp_fn equals_function,
-		size_t initial_size)
+		const void *data, size_t initial_size)
 {
 	unsigned int size = HASHMAP_INITIAL_SIZE;
 
 	memset(map, 0, sizeof(*map));
 
 	map->cmpfn = equals_function ? equals_function : always_equal;
+	map->data = data;
 
 	/* calculate initial table size and allocate the table */
 	initial_size = (unsigned int) ((uint64_t) initial_size * 100
@@ -262,7 +268,8 @@ struct pool_entry {
 
 static int pool_entry_cmp(const struct pool_entry *e1,
 			  const struct pool_entry *e2,
-			  const unsigned char *keydata)
+			  const unsigned char *keydata,
+			  const void *unused_data)
 {
 	return e1->data != keydata &&
 	       (e1->len != e2->len || memcmp(e1->data, keydata, e1->len));
@@ -275,7 +282,7 @@ const void *memintern(const void *data, size_t len)
 
 	/* initialize string pool hashmap */
 	if (!map.tablesize)
-		hashmap_init(&map, (hashmap_cmp_fn) pool_entry_cmp, 0);
+		hashmap_init(&map, (hashmap_cmp_fn) pool_entry_cmp, NULL, 0);
 
 	/* lookup interned string in pool */
 	hashmap_entry_init(&key, memhash(data, len));
diff --git a/hashmap.h b/hashmap.h
index de6022a3a9..1c26bbad5b 100644
--- a/hashmap.h
+++ b/hashmap.h
@@ -33,11 +33,12 @@ struct hashmap_entry {
 };
 
 typedef int (*hashmap_cmp_fn)(const void *entry, const void *entry_or_key,
-		const void *keydata);
+		const void *keydata, const void *cbdata);
 
 struct hashmap {
 	struct hashmap_entry **table;
 	hashmap_cmp_fn cmpfn;
+	const void *data;
 	unsigned int size, tablesize, grow_at, shrink_at;
 	unsigned disallow_rehash : 1;
 };
@@ -50,8 +51,10 @@ struct hashmap_iter {
 
 /* hashmap functions */
 
-extern void hashmap_init(struct hashmap *map, hashmap_cmp_fn equals_function,
-		size_t initial_size);
+extern void hashmap_init(struct hashmap *map,
+			 hashmap_cmp_fn equals_function,
+			 const void *data,
+			 size_t initial_size);
 extern void hashmap_free(struct hashmap *map, int free_entries);
 
 /* hashmap_entry functions */
diff --git a/name-hash.c b/name-hash.c
index 39309efb7f..4e9eac925a 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -17,7 +17,9 @@ struct dir_entry {
 };
 
 static int dir_entry_cmp(const struct dir_entry *e1,
-		const struct dir_entry *e2, const char *name)
+			 const struct dir_entry *e2,
+			 const char *name,
+			 const void *unused_data)
 {
 	return e1->namelen != e2->namelen || strncasecmp(e1->name,
 			name ? name : e2->name, e1->namelen);
@@ -108,7 +110,9 @@ static void hash_index_entry(struct index_state *istate, struct cache_entry *ce)
 }
 
 static int cache_entry_cmp(const struct cache_entry *ce1,
-		const struct cache_entry *ce2, const void *remove)
+			   const struct cache_entry *ce2,
+			   const void *remove,
+			   const void *unused_data)
 {
 	/*
 	 * For remove_name_hash, find the exact entry (pointer equality); for
@@ -571,9 +575,9 @@ static void lazy_init_name_hash(struct index_state *istate)
 	if (istate->name_hash_initialized)
 		return;
 	hashmap_init(&istate->name_hash, (hashmap_cmp_fn) cache_entry_cmp,
-			istate->cache_nr);
+			NULL, istate->cache_nr);
 	hashmap_init(&istate->dir_hash, (hashmap_cmp_fn) dir_entry_cmp,
-			istate->cache_nr);
+			NULL, istate->cache_nr);
 
 	if (lookup_lazy_params(istate)) {
 		hashmap_disallow_rehash(&istate->dir_hash, 1);
diff --git a/oidset.c b/oidset.c
index ac169f05d3..f20e6e4df3 100644
--- a/oidset.c
+++ b/oidset.c
@@ -7,7 +7,8 @@ struct oidset_entry {
 };
 
 static int oidset_hashcmp(const void *va, const void *vb,
-			  const void *vkey)
+			  const void *vkey,
+			  const void *unused_data)
 {
 	const struct oidset_entry *a = va, *b = vb;
 	const struct object_id *key = vkey;
@@ -30,7 +31,7 @@ int oidset_insert(struct oidset *set, const struct object_id *oid)
 	struct oidset_entry *entry;
 
 	if (!set->map.cmpfn)
-		hashmap_init(&set->map, oidset_hashcmp, 0);
+		hashmap_init(&set->map, oidset_hashcmp, NULL, 0);
 
 	if (oidset_contains(set, oid))
 		return 1;
diff --git a/patch-ids.c b/patch-ids.c
index 9c0ab9e67a..b9b2ebbad0 100644
--- a/patch-ids.c
+++ b/patch-ids.c
@@ -37,6 +37,7 @@ int commit_patch_id(struct commit *commit, struct diff_options *options,
  */
 static int patch_id_cmp(struct patch_id *a,
 			struct patch_id *b,
+			const void *unused_keydata,
 			struct diff_options *opt)
 {
 	if (is_null_oid(&a->patch_id) &&
@@ -57,7 +58,8 @@ int init_patch_ids(struct patch_ids *ids)
 	ids->diffopts.detect_rename = 0;
 	DIFF_OPT_SET(&ids->diffopts, RECURSIVE);
 	diff_setup_done(&ids->diffopts);
-	hashmap_init(&ids->patches, (hashmap_cmp_fn)patch_id_cmp, 256);
+	hashmap_init(&ids->patches, (hashmap_cmp_fn)patch_id_cmp,
+		     &ids->diffopts, 256);
 	return 0;
 }
 
@@ -93,7 +95,7 @@ struct patch_id *has_commit_patch_id(struct commit *commit,
 	if (init_patch_id_entry(&patch, commit, ids))
 		return NULL;
 
-	return hashmap_get(&ids->patches, &patch, &ids->diffopts);
+	return hashmap_get(&ids->patches, &patch, NULL);
 }
 
 struct patch_id *add_commit_patch_id(struct commit *commit,
diff --git a/refs.c b/refs.c
index 88658ba769..71e0170feb 100644
--- a/refs.c
+++ b/refs.c
@@ -1526,7 +1526,7 @@ struct ref_store_hash_entry
 };
 
 static int ref_store_hash_cmp(const void *entry, const void *entry_or_key,
-			      const void *keydata)
+			      const void *keydata, const void *unused_data)
 {
 	const struct ref_store_hash_entry *e1 = entry, *e2 = entry_or_key;
 	const char *name = keydata ? keydata : e2->name;
@@ -1608,7 +1608,7 @@ static void register_ref_store_map(struct hashmap *map,
 				   const char *name)
 {
 	if (!map->tablesize)
-		hashmap_init(map, ref_store_hash_cmp, 0);
+		hashmap_init(map, ref_store_hash_cmp, NULL, 0);
 
 	if (hashmap_put(map, alloc_ref_store_hash_entry(name, refs)))
 		die("BUG: %s ref_store '%s' initialized twice", type, name);
diff --git a/remote.c b/remote.c
index d87482573d..b5e05890ee 100644
--- a/remote.c
+++ b/remote.c
@@ -133,7 +133,10 @@ struct remotes_hash_key {
 	int len;
 };
 
-static int remotes_hash_cmp(const struct remote *a, const struct remote *b, const struct remotes_hash_key *key)
+static int remotes_hash_cmp(const struct remote *a,
+			    const struct remote *b,
+			    const struct remotes_hash_key *key,
+			    const void *unused_data)
 {
 	if (key)
 		return strncmp(a->name, key->str, key->len) || a->name[key->len];
@@ -144,7 +147,7 @@ static int remotes_hash_cmp(const struct remote *a, const struct remote *b, cons
 static inline void init_remotes_hash(void)
 {
 	if (!remotes_hash.cmpfn)
-		hashmap_init(&remotes_hash, (hashmap_cmp_fn)remotes_hash_cmp, 0);
+		hashmap_init(&remotes_hash, (hashmap_cmp_fn)remotes_hash_cmp, NULL, 0);
 }
 
 static struct remote *make_remote(const char *name, int len)
diff --git a/sha1_file.c b/sha1_file.c
index fb1fd809dc..b748db31ed 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2390,7 +2390,8 @@ static int delta_base_cache_key_eq(const struct delta_base_cache_key *a,
 }
 
 static int delta_base_cache_hash_cmp(const void *va, const void *vb,
-				     const void *vkey)
+				     const void *vkey,
+				     const void *unused_data)
 {
 	const struct delta_base_cache_entry *a = va, *b = vb;
 	const struct delta_base_cache_key *key = vkey;
@@ -2472,7 +2473,7 @@ static void add_delta_base_cache(struct packed_git *p, off_t base_offset,
 	list_add_tail(&ent->lru, &delta_base_cache_lru);
 
 	if (!delta_base_cache.cmpfn)
-		hashmap_init(&delta_base_cache, delta_base_cache_hash_cmp, 0);
+		hashmap_init(&delta_base_cache, delta_base_cache_hash_cmp, NULL, 0);
 	hashmap_entry_init(ent, pack_entry_hash(p, base_offset));
 	hashmap_add(&delta_base_cache, ent);
 }
diff --git a/sub-process.c b/sub-process.c
index 92f8aea70a..68b53cf48a 100644
--- a/sub-process.c
+++ b/sub-process.c
@@ -6,8 +6,9 @@
 #include "pkt-line.h"
 
 int cmd2process_cmp(const struct subprocess_entry *e1,
-			   const struct subprocess_entry *e2,
-			   const void *unused)
+		    const struct subprocess_entry *e2,
+		    const void *unused_keydata,
+		    const void *unused_data)
 {
 	return strcmp(e1->cmd, e2->cmd);
 }
diff --git a/sub-process.h b/sub-process.h
index d9a45cd359..a19b4d9936 100644
--- a/sub-process.h
+++ b/sub-process.h
@@ -20,8 +20,10 @@ struct subprocess_entry {
 
 /* subprocess functions */
 
-int cmd2process_cmp(const struct subprocess_entry *e1,
-	const struct subprocess_entry *e2, const void *unused);
+extern int cmd2process_cmp(const struct subprocess_entry *e1,
+			   const struct subprocess_entry *e2,
+			   const void *unused_keydata,
+			   const void *unused_data);
 
 typedef int(*subprocess_start_fn)(struct subprocess_entry *entry);
 int subprocess_start(struct hashmap *hashmap, struct subprocess_entry *entry, const char *cmd,
diff --git a/submodule-config.c b/submodule-config.c
index d8f8d5ea32..5c2fc489b8 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -36,7 +36,8 @@ static int is_cache_init;
 
 static int config_path_cmp(const struct submodule_entry *a,
 			   const struct submodule_entry *b,
-			   const void *unused)
+			   const void *unused_keydata,
+			   const void *unused_data)
 {
 	return strcmp(a->config->path, b->config->path) ||
 	       hashcmp(a->config->gitmodules_sha1, b->config->gitmodules_sha1);
@@ -44,7 +45,8 @@ static int config_path_cmp(const struct submodule_entry *a,
 
 static int config_name_cmp(const struct submodule_entry *a,
 			   const struct submodule_entry *b,
-			   const void *unused)
+			   const void *unused_keydata,
+			   const void *unused_data)
 {
 	return strcmp(a->config->name, b->config->name) ||
 	       hashcmp(a->config->gitmodules_sha1, b->config->gitmodules_sha1);
@@ -52,8 +54,8 @@ static int config_name_cmp(const struct submodule_entry *a,
 
 static void cache_init(struct submodule_cache *cache)
 {
-	hashmap_init(&cache->for_path, (hashmap_cmp_fn) config_path_cmp, 0);
-	hashmap_init(&cache->for_name, (hashmap_cmp_fn) config_name_cmp, 0);
+	hashmap_init(&cache->for_path, (hashmap_cmp_fn) config_path_cmp, NULL, 0);
+	hashmap_init(&cache->for_name, (hashmap_cmp_fn) config_name_cmp, NULL, 0);
 }
 
 static void free_one_config(struct submodule_entry *entry)
diff --git a/t/helper/test-hashmap.c b/t/helper/test-hashmap.c
index 7aa9440e27..72cdc4a6b1 100644
--- a/t/helper/test-hashmap.c
+++ b/t/helper/test-hashmap.c
@@ -14,13 +14,17 @@ static const char *get_value(const struct test_entry *e)
 }
 
 static int test_entry_cmp(const struct test_entry *e1,
-		const struct test_entry *e2, const char* key)
+			  const struct test_entry *e2,
+			  const char* key,
+			  const void *unused_data)
 {
 	return strcmp(e1->key, key ? key : e2->key);
 }
 
 static int test_entry_cmp_icase(const struct test_entry *e1,
-		const struct test_entry *e2, const char* key)
+				const struct test_entry *e2,
+				const char* key,
+				const void *unused_data)
 {
 	return strcasecmp(e1->key, key ? key : e2->key);
 }
@@ -92,7 +96,8 @@ static void perf_hashmap(unsigned int method, unsigned int rounds)
 	if (method & TEST_ADD) {
 		/* test adding to the map */
 		for (j = 0; j < rounds; j++) {
-			hashmap_init(&map, (hashmap_cmp_fn) test_entry_cmp, 0);
+			hashmap_init(&map, (hashmap_cmp_fn) test_entry_cmp,
+				     NULL, 0);
 
 			/* add entries */
 			for (i = 0; i < TEST_SIZE; i++) {
@@ -104,7 +109,7 @@ static void perf_hashmap(unsigned int method, unsigned int rounds)
 		}
 	} else {
 		/* test map lookups */
-		hashmap_init(&map, (hashmap_cmp_fn) test_entry_cmp, 0);
+		hashmap_init(&map, (hashmap_cmp_fn) test_entry_cmp, NULL, 0);
 
 		/* fill the map (sparsely if specified) */
 		j = (method & TEST_SPARSE) ? TEST_SIZE / 10 : TEST_SIZE;
@@ -147,7 +152,7 @@ int cmd_main(int argc, const char **argv)
 	/* init hash map */
 	icase = argc > 1 && !strcmp("ignorecase", argv[1]);
 	hashmap_init(&map, (hashmap_cmp_fn) (icase ? test_entry_cmp_icase
-			: test_entry_cmp), 0);
+			: test_entry_cmp), NULL, 0);
 
 	/* process commands from stdin */
 	while (fgets(line, sizeof(line), stdin)) {
-- 
2.13.0.31.g9b732c453e


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCHv2 2/2] hashmap: migrate documentation from Documentation/technical into header
  2017-06-29 23:53 ` [PATCHv2 0/2] Introduce data field in hashmap and migrate docs to header Stefan Beller
  2017-06-29 23:53   ` [PATCHv2 1/2] hashmap.h: compare function has access to a data field Stefan Beller
@ 2017-06-29 23:53   ` Stefan Beller
  2017-06-30 19:14   ` [PATCHv3 0/3] Introduce data field in hashmap and migrate docs to header Stefan Beller
  2 siblings, 0 replies; 26+ messages in thread
From: Stefan Beller @ 2017-06-29 23:53 UTC (permalink / raw)
  To: sbeller; +Cc: git, gitster, jrnieder, git

While at it, clarify the use of `key`, `keydata`, `entry_or_key` as well
as documenting the new data pointer for the compare function.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 Documentation/technical/api-hashmap.txt | 309 ----------------------------
 hashmap.h                               | 351 +++++++++++++++++++++++++++++---
 2 files changed, 318 insertions(+), 342 deletions(-)
 delete mode 100644 Documentation/technical/api-hashmap.txt

diff --git a/Documentation/technical/api-hashmap.txt b/Documentation/technical/api-hashmap.txt
deleted file mode 100644
index ccc634bbd7..0000000000
--- a/Documentation/technical/api-hashmap.txt
+++ /dev/null
@@ -1,309 +0,0 @@
-hashmap API
-===========
-
-The hashmap API is a generic implementation of hash-based key-value mappings.
-
-Data Structures
----------------
-
-`struct hashmap`::
-
-	The hash table structure. Members can be used as follows, but should
-	not be modified directly:
-+
-The `size` member keeps track of the total number of entries (0 means the
-hashmap is empty).
-+
-`tablesize` is the allocated size of the hash table. A non-0 value indicates
-that the hashmap is initialized. It may also be useful for statistical purposes
-(i.e. `size / tablesize` is the current load factor).
-+
-`cmpfn` stores the comparison function specified in `hashmap_init()`. In
-advanced scenarios, it may be useful to change this, e.g. to switch between
-case-sensitive and case-insensitive lookup.
-+
-When `disallow_rehash` is set, automatic rehashes are prevented during inserts
-and deletes.
-
-`struct hashmap_entry`::
-
-	An opaque structure representing an entry in the hash table, which must
-	be used as first member of user data structures. Ideally it should be
-	followed by an int-sized member to prevent unused memory on 64-bit
-	systems due to alignment.
-+
-The `hash` member is the entry's hash code and the `next` member points to the
-next entry in case of collisions (i.e. if multiple entries map to the same
-bucket).
-
-`struct hashmap_iter`::
-
-	An iterator structure, to be used with hashmap_iter_* functions.
-
-Types
------
-
-`int (*hashmap_cmp_fn)(const void *entry, const void *entry_or_key, const void *keydata)`::
-
-	User-supplied function to test two hashmap entries for equality. Shall
-	return 0 if the entries are equal.
-+
-This function is always called with non-NULL `entry` / `entry_or_key`
-parameters that have the same hash code. When looking up an entry, the `key`
-and `keydata` parameters to hashmap_get and hashmap_remove are always passed
-as second and third argument, respectively. Otherwise, `keydata` is NULL.
-
-Functions
----------
-
-`unsigned int strhash(const char *buf)`::
-`unsigned int strihash(const char *buf)`::
-`unsigned int memhash(const void *buf, size_t len)`::
-`unsigned int memihash(const void *buf, size_t len)`::
-`unsigned int memihash_cont(unsigned int hash_seed, const void *buf, size_t len)`::
-
-	Ready-to-use hash functions for strings, using the FNV-1 algorithm (see
-	http://www.isthe.com/chongo/tech/comp/fnv).
-+
-`strhash` and `strihash` take 0-terminated strings, while `memhash` and
-`memihash` operate on arbitrary-length memory.
-+
-`strihash` and `memihash` are case insensitive versions.
-+
-`memihash_cont` is a variant of `memihash` that allows a computation to be
-continued with another chunk of data.
-
-`unsigned int sha1hash(const unsigned char *sha1)`::
-
-	Converts a cryptographic hash (e.g. SHA-1) into an int-sized hash code
-	for use in hash tables. Cryptographic hashes are supposed to have
-	uniform distribution, so in contrast to `memhash()`, this just copies
-	the first `sizeof(int)` bytes without shuffling any bits. Note that
-	the results will be different on big-endian and little-endian
-	platforms, so they should not be stored or transferred over the net.
-
-`void hashmap_init(struct hashmap *map, hashmap_cmp_fn equals_function, size_t initial_size)`::
-
-	Initializes a hashmap structure.
-+
-`map` is the hashmap to initialize.
-+
-The `equals_function` can be specified to compare two entries for equality.
-If NULL, entries are considered equal if their hash codes are equal.
-+
-If the total number of entries is known in advance, the `initial_size`
-parameter may be used to preallocate a sufficiently large table and thus
-prevent expensive resizing. If 0, the table is dynamically resized.
-
-`void hashmap_free(struct hashmap *map, int free_entries)`::
-
-	Frees a hashmap structure and allocated memory.
-+
-`map` is the hashmap to free.
-+
-If `free_entries` is true, each hashmap_entry in the map is freed as well
-(using stdlib's free()).
-
-`void hashmap_entry_init(void *entry, unsigned int hash)`::
-
-	Initializes a hashmap_entry structure.
-+
-`entry` points to the entry to initialize.
-+
-`hash` is the hash code of the entry.
-+
-The hashmap_entry structure does not hold references to external resources,
-and it is safe to just discard it once you are done with it (i.e. if
-your structure was allocated with xmalloc(), you can just free(3) it,
-and if it is on stack, you can just let it go out of scope).
-
-`void *hashmap_get(const struct hashmap *map, const void *key, const void *keydata)`::
-
-	Returns the hashmap entry for the specified key, or NULL if not found.
-+
-`map` is the hashmap structure.
-+
-`key` is a hashmap_entry structure (or user data structure that starts with
-hashmap_entry) that has at least been initialized with the proper hash code
-(via `hashmap_entry_init`).
-+
-If an entry with matching hash code is found, `key` and `keydata` are passed
-to `hashmap_cmp_fn` to decide whether the entry matches the key.
-
-`void *hashmap_get_from_hash(const struct hashmap *map, unsigned int hash, const void *keydata)`::
-
-	Returns the hashmap entry for the specified hash code and key data,
-	or NULL if not found.
-+
-`map` is the hashmap structure.
-+
-`hash` is the hash code of the entry to look up.
-+
-If an entry with matching hash code is found, `keydata` is passed to
-`hashmap_cmp_fn` to decide whether the entry matches the key. The
-`entry_or_key` parameter points to a bogus hashmap_entry structure that
-should not be used in the comparison.
-
-`void *hashmap_get_next(const struct hashmap *map, const void *entry)`::
-
-	Returns the next equal hashmap entry, or NULL if not found. This can be
-	used to iterate over duplicate entries (see `hashmap_add`).
-+
-`map` is the hashmap structure.
-+
-`entry` is the hashmap_entry to start the search from, obtained via a previous
-call to `hashmap_get` or `hashmap_get_next`.
-
-`void hashmap_add(struct hashmap *map, void *entry)`::
-
-	Adds a hashmap entry. This allows to add duplicate entries (i.e.
-	separate values with the same key according to hashmap_cmp_fn).
-+
-`map` is the hashmap structure.
-+
-`entry` is the entry to add.
-
-`void *hashmap_put(struct hashmap *map, void *entry)`::
-
-	Adds or replaces a hashmap entry. If the hashmap contains duplicate
-	entries equal to the specified entry, only one of them will be replaced.
-+
-`map` is the hashmap structure.
-+
-`entry` is the entry to add or replace.
-+
-Returns the replaced entry, or NULL if not found (i.e. the entry was added).
-
-`void *hashmap_remove(struct hashmap *map, const void *key, const void *keydata)`::
-
-	Removes a hashmap entry matching the specified key. If the hashmap
-	contains duplicate entries equal to the specified key, only one of
-	them will be removed.
-+
-`map` is the hashmap structure.
-+
-`key` is a hashmap_entry structure (or user data structure that starts with
-hashmap_entry) that has at least been initialized with the proper hash code
-(via `hashmap_entry_init`).
-+
-If an entry with matching hash code is found, `key` and `keydata` are
-passed to `hashmap_cmp_fn` to decide whether the entry matches the key.
-+
-Returns the removed entry, or NULL if not found.
-
-`void hashmap_disallow_rehash(struct hashmap *map, unsigned value)`::
-
-	Disallow/allow automatic rehashing of the hashmap during inserts
-	and deletes.
-+
-This is useful if the caller knows that the hashmap will be accessed
-by multiple threads.
-+
-The caller is still responsible for any necessary locking; this simply
-prevents unexpected rehashing.  The caller is also responsible for properly
-sizing the initial hashmap to ensure good performance.
-+
-A call to allow rehashing does not force a rehash; that might happen
-with the next insert or delete.
-
-`void hashmap_iter_init(struct hashmap *map, struct hashmap_iter *iter)`::
-`void *hashmap_iter_next(struct hashmap_iter *iter)`::
-`void *hashmap_iter_first(struct hashmap *map, struct hashmap_iter *iter)`::
-
-	Used to iterate over all entries of a hashmap. Note that it is
-	not safe to add or remove entries to the hashmap while
-	iterating.
-+
-`hashmap_iter_init` initializes a `hashmap_iter` structure.
-+
-`hashmap_iter_next` returns the next hashmap_entry, or NULL if there are no
-more entries.
-+
-`hashmap_iter_first` is a combination of both (i.e. initializes the iterator
-and returns the first entry, if any).
-
-`const char *strintern(const char *string)`::
-`const void *memintern(const void *data, size_t len)`::
-
-	Returns the unique, interned version of the specified string or data,
-	similar to the `String.intern` API in Java and .NET, respectively.
-	Interned strings remain valid for the entire lifetime of the process.
-+
-Can be used as `[x]strdup()` or `xmemdupz` replacement, except that interned
-strings / data must not be modified or freed.
-+
-Interned strings are best used for short strings with high probability of
-duplicates.
-+
-Uses a hashmap to store the pool of interned strings.
-
-Usage example
--------------
-
-Here's a simple usage example that maps long keys to double values.
-------------
-struct hashmap map;
-
-struct long2double {
-	struct hashmap_entry ent; /* must be the first member! */
-	long key;
-	double value;
-};
-
-static int long2double_cmp(const struct long2double *e1, const struct long2double *e2, const void *unused)
-{
-	return !(e1->key == e2->key);
-}
-
-void long2double_init(void)
-{
-	hashmap_init(&map, (hashmap_cmp_fn) long2double_cmp, 0);
-}
-
-void long2double_free(void)
-{
-	hashmap_free(&map, 1);
-}
-
-static struct long2double *find_entry(long key)
-{
-	struct long2double k;
-	hashmap_entry_init(&k, memhash(&key, sizeof(long)));
-	k.key = key;
-	return hashmap_get(&map, &k, NULL);
-}
-
-double get_value(long key)
-{
-	struct long2double *e = find_entry(key);
-	return e ? e->value : 0;
-}
-
-void set_value(long key, double value)
-{
-	struct long2double *e = find_entry(key);
-	if (!e) {
-		e = malloc(sizeof(struct long2double));
-		hashmap_entry_init(e, memhash(&key, sizeof(long)));
-		e->key = key;
-		hashmap_add(&map, e);
-	}
-	e->value = value;
-}
-------------
-
-Using variable-sized keys
--------------------------
-
-The `hashmap_entry_get` and `hashmap_entry_remove` functions expect an ordinary
-`hashmap_entry` structure as key to find the correct entry. If the key data is
-variable-sized (e.g. a FLEX_ARRAY string) or quite large, it is undesirable
-to create a full-fledged entry structure on the heap and copy all the key data
-into the structure.
-
-In this case, the `keydata` parameter can be used to pass
-variable-sized key data directly to the comparison function, and the `key`
-parameter can be a stripped-down, fixed size entry structure allocated on the
-stack.
-
-See test-hashmap.c for an example using arbitrary-length strings as keys.
diff --git a/hashmap.h b/hashmap.h
index 1c26bbad5b..2e0adbe5df 100644
--- a/hashmap.h
+++ b/hashmap.h
@@ -3,17 +3,123 @@
 
 /*
  * Generic implementation of hash-based key-value mappings.
- * See Documentation/technical/api-hashmap.txt.
+ *
+ * An example that maps long to a string:
+ * For the sake of the example this allows to lookup exact values, too
+ * (i.e. it is operated as a set, the value is part of the key)
+ * -------------------------------------
+ *
+ * struct hashmap map;
+ * struct long2string {
+ *     struct hashmap_entry ent; // must be the first member!
+ *     long key;
+ *     char value[FLEX_ARRAY];   // be careful with allocating on stack!
+ * };
+ *
+ * #define COMPARE_VALUE 1
+ *
+ * static int long2string_cmp(const struct long2string *e1,
+ *                            const struct long2string *e2,
+ *                            const void *keydata, const void *userdata)
+ * {
+ *     char *string = keydata;
+ *     unsigned *flags = (unsigned*)userdata;
+ *
+ *     if (flags & COMPARE_VALUE)
+ *         return !(e1->key == e2->key) || (keydata ?
+ *                  strcmp(e1->value, keydata) : strcmp(e1->value, e2->value));
+ *     else
+ *         return !(e1->key == e2->key);
+ * }
+ *
+ * int main(int argc, char **argv)
+ * {
+ *     long key;
+ *     char *value, *action;
+ *
+ *     unsigned flags = ALLOW_DUPLICATE_KEYS;
+ *
+ *     hashmap_init(&map, (hashmap_cmp_fn) long2string_cmp, &flags, 0);
+ *
+ *     while (scanf("%s %l %s", action, key, value)) {
+ *
+ *         if (!strcmp("add", action)) {
+ *             struct long2string *e;
+ *             e = malloc(sizeof(struct long2string) + strlen(value));
+ *             hashmap_entry_init(e, memhash(&key, sizeof(long)));
+ *             e->key = key;
+ *             memcpy(e->value, value, strlen(value));
+ *             hashmap_add(&map, e);
+ *         }
+ *
+ *         if (!strcmp("print_all_by_key", action)) {
+ *             flags &= ~COMPARE_VALUE;
+ *
+ *             struct long2string k;
+ *             hashmap_entry_init(&k, memhash(&key, sizeof(long)));
+ *             k.key = key;
+ *
+ *             struct long2string *e = hashmap_get(&map, &k, NULL);
+ *             if (e) {
+ *                 printf("first: %l %s\n", e->key, e->value);
+ *                 while (e = hashmap_get_next(&map, e))
+ *                     printf("found more: %l %s\n", e->key, e->value);
+ *             }
+ *         }
+ *
+ *         if (!strcmp("has_exact_match", action)) {
+ *             flags |= COMPARE_VALUE;
+ *
+ *             struct long2string *e;
+ *             e = malloc(sizeof(struct long2string) + strlen(value));
+ *             hashmap_entry_init(e, memhash(&key, sizeof(long)));
+ *             e->key = key;
+ *             memcpy(e->value, value, strlen(value));
+ *
+ *             printf("%s found\n", hashmap_get(&map, e, NULL) ? "" : "not");
+ *         }
+ *
+ *         if (!strcmp("has_exact_match_no_heap_alloc", action)) {
+ *             flags |= COMPARE_VALUE;
+ *
+ *             struct long2string e;
+ *             hashmap_entry_init(e, memhash(&key, sizeof(long)));
+ *             e.key = key;
+ *
+ *             printf("%s found\n", hashmap_get(&map, e, value) ? "" : "not");
+ *         }
+ *
+ *         if (!strcmp("end", action)) {
+ *             hashmap_free(&map, 1);
+ *             break;
+ *         }
+ *     }
+ * }
  */
 
-/* FNV-1 functions */
-
+/*
+ * Ready-to-use hash functions for strings, using the FNV-1 algorithm (see
+ * http://www.isthe.com/chongo/tech/comp/fnv).
+ * `strhash` and `strihash` take 0-terminated strings, while `memhash` and
+ * `memihash` operate on arbitrary-length memory.
+ * `strihash` and `memihash` are case insensitive versions.
+ * `memihash_cont` is a variant of `memihash` that allows a computation to be
+ * continued with another chunk of data.
+ */
 extern unsigned int strhash(const char *buf);
 extern unsigned int strihash(const char *buf);
 extern unsigned int memhash(const void *buf, size_t len);
 extern unsigned int memihash(const void *buf, size_t len);
 extern unsigned int memihash_cont(unsigned int hash_seed, const void *buf, size_t len);
 
+/*
+ * Converts a cryptographic hash (e.g. SHA-1) into an int-sized hash code
+ * for use in hash tables. Cryptographic hashes are supposed to have
+ * uniform distribution, so in contrast to `memhash()`, this just copies
+ * the first `sizeof(int)` bytes without shuffling any bits. Note that
+ * the results will be different on big-endian and little-endian
+ * platforms, so they should not be stored or transferred over the net.
+ */
 static inline unsigned int sha1hash(const unsigned char *sha1)
 {
 	/*
@@ -25,89 +131,255 @@ static inline unsigned int sha1hash(const unsigned char *sha1)
 	return hash;
 }
 
-/* data structures */
-
+/*
+ * struct hashmap_entry is an opaque structure representing an entry in the
+ * hash table, which must be used as first member of user data structures.
+ * Ideally it should be followed by an int-sized member to prevent unused
+ * memory on 64-bit systems due to alignment.
+ */
 struct hashmap_entry {
+	/*
+	 * next points to the next entry in case of collisions (i.e. if
+	 * multiple entries map to the same bucket)
+	 */
 	struct hashmap_entry *next;
+
+	/* entry's hash code */
 	unsigned int hash;
 };
 
+/*
+ * User-supplied function to test two hashmap entries for equality. Shall
+ * return 0 if the entries are equal.
+ *
+ * This function is always called with non-NULL `entry` and `entry_or_key`
+ * parameters that have the same hash code.
+ *
+ * When looking up an entry, the `key` and `keydata` parameters to hashmap_get
+ * and hashmap_remove are always passed as second `entry_or_key` and third
+ * argument `keydata`, respectively. Otherwise, `keydata` is NULL.
+ *
+ * When it is too expensive to allocate a user entry (either because it is
+ * large or varialbe sized, such that it is not on the stack), then the
+ * relevant data to check for equality should be passed via `keydata`.
+ * In this case `key` can be a stripped down version of the user key data
+ * or even just a hashmap_entry having the correct hash.
+ *
+ * The `data` entry is the pointer given in the init function and is always
+ * given.
+ */
 typedef int (*hashmap_cmp_fn)(const void *entry, const void *entry_or_key,
-		const void *keydata, const void *cbdata);
+		const void *keydata, const void *data);
 
+/*
+ * struct hashmap is the hash table structure. Members can be used as follows,
+ * but should not be modified directly.
+ */
 struct hashmap {
 	struct hashmap_entry **table;
+
+	/* Stores the comparison function specified in `hashmap_init()`. */
 	hashmap_cmp_fn cmpfn;
 	const void *data;
-	unsigned int size, tablesize, grow_at, shrink_at;
-	unsigned disallow_rehash : 1;
-};
 
-struct hashmap_iter {
-	struct hashmap *map;
-	struct hashmap_entry *next;
-	unsigned int tablepos;
+	/* total number of entries (0 means the hashmap is empty) */
+	unsigned int size;
+
+	/*
+	 * tablesize is the allocated size of the hash table. A non-0 value
+	 * indicates that the hashmap is initialized. It may also be useful
+	 * for statistical purposes (i.e. `size / tablesize` is the current
+	 * load factor).
+	 */
+	unsigned int tablesize;
+
+	unsigned int grow_at;
+	unsigned int shrink_at;
+
+	/* See `hashmap_disallow_rehash`. */
+	unsigned disallow_rehash : 1;
 };
 
 /* hashmap functions */
 
+/*
+ * Initializes a hashmap structure.
+ *
+ * `map` is the hashmap to initialize.
+ *
+ * The `equals_function` can be specified to compare two entries for equality.
+ * If NULL, entries are considered equal if their hash codes are equal.
+ *
+ * The `data` parameter can be used to provide additional data (a callback
+ * cookie) that will be passed to `equals_function` each time it is called.
+ * This allows a single `equals_function` to implement multiple comparison
+ * functions.
+ *
+ * If the total number of entries is known in advance, the `initial_size`
+ * parameter may be used to preallocate a sufficiently large table and thus
+ * prevent expensive resizing. If 0, the table is dynamically resized.
+ */
 extern void hashmap_init(struct hashmap *map,
 			 hashmap_cmp_fn equals_function,
 			 const void *data,
 			 size_t initial_size);
+
+/*
+ * Frees a hashmap structure and allocated memory.
+ *
+ * If `free_entries` is true, each hashmap_entry in the map is freed as well
+ * using stdlibs free().
+ */
 extern void hashmap_free(struct hashmap *map, int free_entries);
 
 /* hashmap_entry functions */
 
+/*
+ * Initializes a hashmap_entry structure.
+ *
+ * `entry` points to the entry to initialize.
+ * `hash` is the hash code of the entry.
+ *
+ * The hashmap_entry structure does not hold references to external resources,
+ * and it is safe to just discard it once you are done with it (i.e. if
+ * your structure was allocated with xmalloc(), you can just free(3) it,
+ * and if it is on stack, you can just let it go out of scope).
+ */
 static inline void hashmap_entry_init(void *entry, unsigned int hash)
 {
 	struct hashmap_entry *e = entry;
 	e->hash = hash;
 	e->next = NULL;
 }
+
+/*
+ * Returns the hashmap entry for the specified key, or NULL if not found.
+ *
+ * `map` is the hashmap structure.
+ *
+ * `key` is a user data structure that starts with hashmap_entry that has at
+ * least been initialized with the proper hash code (via `hashmap_entry_init`).
+ *
+ * `keydata` is a data structure that holds just enough information to check
+ * for equality to a given entry.
+ *
+ * If the key data is variable-sized (e.g. a FLEX_ARRAY string) or quite large,
+ * it is undesirable to create a full-fledged entry structure on the heap and
+ * copy all the key data into the structure.
+ *
+ * In this case, the `keydata` parameter can be used to pass
+ * variable-sized key data directly to the comparison function, and the `key`
+ * parameter can be a stripped-down, fixed size entry structure allocated on the
+ * stack.
+ *
+ * If an entry with matching hash code is found, `key` and `keydata` are passed
+ * to `hashmap_cmp_fn` to decide whether the entry matches the key.
+ */
 extern void *hashmap_get(const struct hashmap *map, const void *key,
-		const void *keydata);
-extern void *hashmap_get_next(const struct hashmap *map, const void *entry);
-extern void hashmap_add(struct hashmap *map, void *entry);
-extern void *hashmap_put(struct hashmap *map, void *entry);
-extern void *hashmap_remove(struct hashmap *map, const void *key,
-		const void *keydata);
+			 const void *keydata);
 
+/*
+ * Returns the hashmap entry for the specified hash code and key data,
+ * or NULL if not found.
+ *
+ * `map` is the hashmap structure.
+ * `hash` is the hash code of the entry to look up.
+ *
+ * If an entry with matching hash code is found, `keydata` is passed to
+ * `hashmap_cmp_fn` to decide whether the entry matches the key. The
+ * `entry_or_key` parameter of `hashmap_cmp_fn` points to a hashmap_entry
+ * structure that should not be used in the comparison.
+ */
 static inline void *hashmap_get_from_hash(const struct hashmap *map,
-		unsigned int hash, const void *keydata)
+					  unsigned int hash,
+					  const void *keydata)
 {
 	struct hashmap_entry key;
 	hashmap_entry_init(&key, hash);
 	return hashmap_get(map, &key, keydata);
 }
 
+/*
+ * Returns the next equal hashmap entry, or NULL if not found. This can be
+ * used to iterate over duplicate entries (see `hashmap_add`).
+ *
+ * `map` is the hashmap structure.
+ * `entry` is the hashmap_entry to start the search from, obtained via a previous
+ * call to `hashmap_get` or `hashmap_get_next`.
+ */
+extern void *hashmap_get_next(const struct hashmap *map, const void *entry);
+
+/*
+ * Adds a hashmap entry. This allows to add duplicate entries (i.e.
+ * separate values with the same key according to hashmap_cmp_fn).
+ *
+ * `map` is the hashmap structure.
+ * `entry` is the entry to add.
+ */
+extern void hashmap_add(struct hashmap *map, void *entry);
+
+/*
+ * Adds or replaces a hashmap entry. If the hashmap contains duplicate
+ * entries equal to the specified entry, only one of them will be replaced.
+ *
+ * `map` is the hashmap structure.
+ * `entry` is the entry to add or replace.
+ * Returns the replaced entry, or NULL if not found (i.e. the entry was added).
+ */
+extern void *hashmap_put(struct hashmap *map, void *entry);
+
+/*
+ * Removes a hashmap entry matching the specified key. If the hashmap contains
+ * duplicate entries equal to the specified key, only one of them will be
+ * removed. Returns the removed entry, or NULL if not found.
+ *
+ * Argument explanation is the same as in `hashmap_get`.
+ */
+extern void *hashmap_remove(struct hashmap *map, const void *key,
+		const void *keydata);
+
+/*
+ * Returns the `bucket` an entry is stored in.
+ * Useful for multithreaded read access.
+ */
 int hashmap_bucket(const struct hashmap *map, unsigned int hash);
 
 /*
  * Disallow/allow rehashing of the hashmap.
- * This is useful if the caller knows that the hashmap
- * needs multi-threaded access.  The caller is still
- * required to guard/lock searches and inserts in a
- * manner appropriate to their usage.  This simply
- * prevents the table from being unexpectedly re-mapped.
+ * This is useful if the caller knows that the hashmap needs multi-threaded
+ * access.  The caller is still required to guard/lock searches and inserts
+ * in a manner appropriate to their usage.  This simply prevents the table
+ * from being unexpectedly re-mapped.
  *
- * If is up to the caller to ensure that the hashmap is
- * initialized to a reasonable size to prevent poor
- * performance.
+ * It is up to the caller to ensure that the hashmap is initialized to a
+ * reasonable size to prevent poor performance.
  *
- * When value=1, prevent future rehashes on adds and deleted.
- * When value=0, allow future rehahses.  This DOES NOT force
- * a rehash now.
+ * A call to allow rehashing does not force a rehash; that might happen
+ * with the next insert or delete.
  */
 static inline void hashmap_disallow_rehash(struct hashmap *map, unsigned value)
 {
 	map->disallow_rehash = value;
 }
 
-/* hashmap_iter functions */
+/*
+ * Used to iterate over all entries of a hashmap. Note that it is
+ * not safe to add or remove entries to the hashmap while
+ * iterating.
+ */
+struct hashmap_iter {
+	struct hashmap *map;
+	struct hashmap_entry *next;
+	unsigned int tablepos;
+};
 
+/* Initializes a `hashmap_iter` structure. */
 extern void hashmap_iter_init(struct hashmap *map, struct hashmap_iter *iter);
+
+/* Returns the next hashmap_entry, or NULL if there are no more entries. */
 extern void *hashmap_iter_next(struct hashmap_iter *iter);
+
+/* Initializes the iterator and returns the first entry, if any. */
 static inline void *hashmap_iter_first(struct hashmap *map,
 		struct hashmap_iter *iter)
 {
@@ -115,8 +387,21 @@ static inline void *hashmap_iter_first(struct hashmap *map,
 	return hashmap_iter_next(iter);
 }
 
-/* string interning */
+/* String interning */
 
+/*
+ * Returns the unique, interned version of the specified string or data,
+ * similar to the `String.intern` API in Java and .NET, respectively.
+ * Interned strings remain valid for the entire lifetime of the process.
+ *
+ * Can be used as `[x]strdup()` or `xmemdupz` replacement, except that interned
+ * strings / data must not be modified or freed.
+ *
+ * Interned strings are best used for short strings with high probability of
+ * duplicates.
+ *
+ * Uses a hashmap to store the pool of interned strings.
+ */
 extern const void *memintern(const void *data, size_t len);
 static inline const char *strintern(const char *string)
 {
-- 
2.13.0.31.g9b732c453e


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] hashmap.h: compare function has access to a data field
  2017-06-29 18:20       ` Stefan Beller
@ 2017-06-30 17:26         ` Junio C Hamano
  0 siblings, 0 replies; 26+ messages in thread
From: Junio C Hamano @ 2017-06-30 17:26 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git@vger.kernel.org

Stefan Beller <sbeller@google.com> writes:

> Yes it was a last minute squash before sending it out, as the fix was only
> two lines whereas the conversion is a lot. If it were separated I could have
> claimed the introduction to be a rather mechanical patch, but I did not
> make use of coccinelle or such, so the likelihood for errors is just as high.
>
> So I decided to squash them.

I somehow think that logic leads to a suboptimal workflow.  If they
were separated, somebody else could have done an independent
mechanical conversion to verify the result matches yours, which
would give us more confidence.  When such an independent mechanical
conversion does not match, we need one round-trip to ask you if it
was a misconversion or a manual tweak.

In any case, I think I've looked at it long enough to be reasonably
OK with the conversion result myself, so let's move it forward.  Of
course I welcome independent eyeballing by others.

Thanks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHv2 1/2] hashmap.h: compare function has access to a data field
  2017-06-29 23:53   ` [PATCHv2 1/2] hashmap.h: compare function has access to a data field Stefan Beller
@ 2017-06-30 17:34     ` Junio C Hamano
  2017-06-30 17:41       ` Stefan Beller
  2017-06-30 17:39     ` Junio C Hamano
  1 sibling, 1 reply; 26+ messages in thread
From: Junio C Hamano @ 2017-06-30 17:34 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, jrnieder, git

Stefan Beller <sbeller@google.com> writes:

> When using the hashmap a common need is to have access to arbitrary data
> in the compare function. A couple of times we abuse the keydata field
> to pass in the data needed. This happens for example in patch-ids.c.

It is not "arbitrary data"; it is very important to streess that we
are not just passing random crud, but adding a mechanism to
tailor/curry the function in a way that is fixed throughout the
lifetime of a hashmap.

> diff --git a/hashmap.h b/hashmap.h
> index de6022a3a9..1c26bbad5b 100644
> --- a/hashmap.h
> +++ b/hashmap.h
> @@ -33,11 +33,12 @@ struct hashmap_entry {
>  };
>  
>  typedef int (*hashmap_cmp_fn)(const void *entry, const void *entry_or_key,
> -		const void *keydata);
> +		const void *keydata, const void *cbdata);

As I view the new "data" thing the C's (read: poor-man's) way to
cutomize the function, I would have tweaked the function signature
by giving the customization data at the front, i.e.

	fn(fndata, entry, entry_or_key, keydata);

That would hopefully make it more obvious that the new thing is
pairs with fn, not with other arguments (and entry-or-key and
keydata pairs, instead of three old arguments standing as equals).

As I think the way we wish to be able to express it in a better
language would have been something like:

	(partial_apply(fn, fndata))(entry, entry_or_key, keydata)

that order would match what is going on better.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHv2 1/2] hashmap.h: compare function has access to a data field
  2017-06-29 23:53   ` [PATCHv2 1/2] hashmap.h: compare function has access to a data field Stefan Beller
  2017-06-30 17:34     ` Junio C Hamano
@ 2017-06-30 17:39     ` Junio C Hamano
  2017-06-30 18:00       ` Stefan Beller
  1 sibling, 1 reply; 26+ messages in thread
From: Junio C Hamano @ 2017-06-30 17:39 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, jrnieder, git

Stefan Beller <sbeller@google.com> writes:

> diff --git a/patch-ids.c b/patch-ids.c
> index 9c0ab9e67a..b9b2ebbad0 100644
> --- a/patch-ids.c
> +++ b/patch-ids.c
> @@ -37,6 +37,7 @@ int commit_patch_id(struct commit *commit, struct diff_options *options,
>   */
>  static int patch_id_cmp(struct patch_id *a,
>  			struct patch_id *b,
> +			const void *unused_keydata,
>  			struct diff_options *opt)
>  {
>  	if (is_null_oid(&a->patch_id) &&
> @@ -57,7 +58,8 @@ int init_patch_ids(struct patch_ids *ids)
>  	ids->diffopts.detect_rename = 0;
>  	DIFF_OPT_SET(&ids->diffopts, RECURSIVE);
>  	diff_setup_done(&ids->diffopts);
> -	hashmap_init(&ids->patches, (hashmap_cmp_fn)patch_id_cmp, 256);
> +	hashmap_init(&ids->patches, (hashmap_cmp_fn)patch_id_cmp,
> +		     &ids->diffopts, 256);
>  	return 0;
>  }
>  
> @@ -93,7 +95,7 @@ struct patch_id *has_commit_patch_id(struct commit *commit,
>  	if (init_patch_id_entry(&patch, commit, ids))
>  		return NULL;
>  
> -	return hashmap_get(&ids->patches, &patch, &ids->diffopts);
> +	return hashmap_get(&ids->patches, &patch, NULL);
>  }

This actually makes me wonder if we can demonstrate an existing
breakage in tests.  The old code used to pass NULL to the diffopts,
causing it to be passed down through commit_patch_id() function down
to diff_tree_oid() or diff_root_tree_oid().  When the codepath
triggers the issue that Peff warned about in his old article (was it
about rehashing or something?) that makes two entries compared
(i.e. not using keydata, because we are not comparing an existing
entry with a key and data only to see if that already exists in the
hashmap), wouldn't that cause ll_diff_tree_oid() that is called from
diff_tree_oid() to dereference NULL?

Thanks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHv2 1/2] hashmap.h: compare function has access to a data field
  2017-06-30 17:34     ` Junio C Hamano
@ 2017-06-30 17:41       ` Stefan Beller
  2017-06-30 18:47         ` Junio C Hamano
  0 siblings, 1 reply; 26+ messages in thread
From: Stefan Beller @ 2017-06-30 17:41 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git@vger.kernel.org, Jonathan Nieder, Jeff Hostetler

On Fri, Jun 30, 2017 at 10:34 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> When using the hashmap a common need is to have access to arbitrary data
>> in the compare function. A couple of times we abuse the keydata field
>> to pass in the data needed. This happens for example in patch-ids.c.
>
> It is not "arbitrary data"; it is very important to streess that we
> are not just passing random crud, but adding a mechanism to
> tailor/curry the function in a way that is fixed throughout the
> lifetime of a hashmap.

Ah yes, I forgot to fix patch1, when spending all time on the docs in p2.

>
>> diff --git a/hashmap.h b/hashmap.h
>> index de6022a3a9..1c26bbad5b 100644
>> --- a/hashmap.h
>> +++ b/hashmap.h
>> @@ -33,11 +33,12 @@ struct hashmap_entry {
>>  };
>>
>>  typedef int (*hashmap_cmp_fn)(const void *entry, const void *entry_or_key,
>> -             const void *keydata);
>> +             const void *keydata, const void *cbdata);
>
> As I view the new "data" thing the C's (read: poor-man's) way to
> cutomize the function, I would have tweaked the function signature
> by giving the customization data at the front, i.e.
>
>         fn(fndata, entry, entry_or_key, keydata);
>
> That would hopefully make it more obvious that the new thing is
> pairs with fn, not with other arguments (and entry-or-key and
> keydata pairs, instead of three old arguments standing as equals).

Ok, let me redo the patch to have fndata at the front.

Looking at other places (that have a similar mechanism mechanically,
but are semantically different), such as the callback functions for
the diff machinery, we have the user provided pointer at the end
of the list. But that is because the diff_options are the objects that
should be in front row (they are bound to the callback more than
some caller needed data).

    typedef void (*diff_format_fn_t)(struct diff_queue_struct *q,
          struct diff_options *options, void *data);

    typedef struct strbuf *(*diff_prefix_fn_t)(struct diff_options
*opt, void *data);

Thanks!

> As I think the way we wish to be able to express it in a better
> language would have been something like:
>
>         (partial_apply(fn, fndata))(entry, entry_or_key, keydata)
>
> that order would match what is going on better.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHv2 1/2] hashmap.h: compare function has access to a data field
  2017-06-30 17:39     ` Junio C Hamano
@ 2017-06-30 18:00       ` Stefan Beller
  2017-06-30 18:40         ` Junio C Hamano
  0 siblings, 1 reply; 26+ messages in thread
From: Stefan Beller @ 2017-06-30 18:00 UTC (permalink / raw)
  To: Junio C Hamano, Jeff King
  Cc: git@vger.kernel.org, Jonathan Nieder, Jeff Hostetler

On Fri, Jun 30, 2017 at 10:39 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> diff --git a/patch-ids.c b/patch-ids.c
>> index 9c0ab9e67a..b9b2ebbad0 100644
>> --- a/patch-ids.c
>> +++ b/patch-ids.c
>> @@ -37,6 +37,7 @@ int commit_patch_id(struct commit *commit, struct diff_options *options,
>>   */
>>  static int patch_id_cmp(struct patch_id *a,
>>                       struct patch_id *b,
>> +                     const void *unused_keydata,
>>                       struct diff_options *opt)
>>  {
>>       if (is_null_oid(&a->patch_id) &&
>> @@ -57,7 +58,8 @@ int init_patch_ids(struct patch_ids *ids)
>>       ids->diffopts.detect_rename = 0;
>>       DIFF_OPT_SET(&ids->diffopts, RECURSIVE);
>>       diff_setup_done(&ids->diffopts);
>> -     hashmap_init(&ids->patches, (hashmap_cmp_fn)patch_id_cmp, 256);
>> +     hashmap_init(&ids->patches, (hashmap_cmp_fn)patch_id_cmp,
>> +                  &ids->diffopts, 256);
>>       return 0;
>>  }
>>
>> @@ -93,7 +95,7 @@ struct patch_id *has_commit_patch_id(struct commit *commit,
>>       if (init_patch_id_entry(&patch, commit, ids))
>>               return NULL;
>>
>> -     return hashmap_get(&ids->patches, &patch, &ids->diffopts);
>> +     return hashmap_get(&ids->patches, &patch, NULL);
>>  }

+cc Peff

>
> This actually makes me wonder if we can demonstrate an existing
> breakage in tests.  The old code used to pass NULL to the diffopts,
> causing it to be passed down through commit_patch_id() function down
> to diff_tree_oid() or diff_root_tree_oid().  When the codepath
> triggers the issue that Peff warned about in his old article (was it
> about rehashing or something?)  that makes two entries compared
> (i.e. not using keydata, because we are not comparing an existing
> entry with a key and data only to see if that already exists in the
> hashmap), wouldn't that cause ll_diff_tree_oid() that is called from
> diff_tree_oid() to dereference NULL?
>
> Thanks.

I am at a loss here after re-reading your answer over and over,
but I think you are asking if patch_id_cmp can break, as
we have a callchain like

  patch_id_cmp
    commit_patch_id
      (diff_root_tree_oid)
        diff_tree_oid
          ll_diff_tree_oid

passing diff_options down there, and patch_id_cmp may have
gotten NULL.

The answer is no, it was safe. (by accident?)

That is because we never use hashmap_get_next
on the hashmap that uses patch_id_cmp as the compare
function.

hashmap_get_next is the only function that does not pass
on a keydata, any other has valid caller provided keydata.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHv2 1/2] hashmap.h: compare function has access to a data field
  2017-06-30 18:00       ` Stefan Beller
@ 2017-06-30 18:40         ` Junio C Hamano
  0 siblings, 0 replies; 26+ messages in thread
From: Junio C Hamano @ 2017-06-30 18:40 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Jeff King, git@vger.kernel.org, Jonathan Nieder, Jeff Hostetler

Stefan Beller <sbeller@google.com> writes:

> I am at a loss here after re-reading your answer over and over,
> but I think you are asking if patch_id_cmp can break, as
> we have a callchain like
>
>   patch_id_cmp
>     commit_patch_id
>       (diff_root_tree_oid)
>         diff_tree_oid
>           ll_diff_tree_oid
>
> passing diff_options down there, and patch_id_cmp may have
> gotten NULL.
>
> The answer is no, it was safe. (by accident?)
>
> That is because we never use hashmap_get_next
> on the hashmap that uses patch_id_cmp as the compare
> function.
>
> hashmap_get_next is the only function that does not pass
> on a keydata, any other has valid caller provided keydata.

Ah, thanks for clarifying.  I think I misread the earlier
discussion.  So unless somebody goes in to patch-ids.c and adds a
call to do get_next, we won't see a breakage, and it is not worth to
do a test-patch-ids.c that peeks into the hashmap patch-ids.c uses
and does get_next() only to demonstrate a potential future breakage.

OK.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHv2 1/2] hashmap.h: compare function has access to a data field
  2017-06-30 17:41       ` Stefan Beller
@ 2017-06-30 18:47         ` Junio C Hamano
  2017-06-30 18:57           ` Stefan Beller
  0 siblings, 1 reply; 26+ messages in thread
From: Junio C Hamano @ 2017-06-30 18:47 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git@vger.kernel.org, Jonathan Nieder, Jeff Hostetler

Stefan Beller <sbeller@google.com> writes:

> Ok, let me redo the patch to have fndata at the front.
>
> Looking at other places (that have a similar mechanism mechanically,
> but are semantically different), such as the callback functions for
> the diff machinery, we have the user provided pointer at the end
> of the list. But that is because the diff_options are the objects that
> should be in front row (they are bound to the callback more than
> some caller needed data).

Quite honestly, I do not care too deeply about an API specific to a
particular area like "diff" that passes its "configuration" data
that everybody in the API knows, i.e. diff_options.  If the
convention for ordinary functions in the API is to pass that in a
particular location in the parameter list, I would think it is good
for a application-supplied callback function to follow that pattern.
After all, it is to configure the behaviour of the "diff" and the
caller-supplied callback could have been part of a (hypothetically
richer) API implementation.

But I view a comparison function that is given to hashmap that is
supplied by the caller a bit differently.  It is not about
"hashing", so the reason to have the data close to function pointer
is stronger there.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHv2 1/2] hashmap.h: compare function has access to a data field
  2017-06-30 18:47         ` Junio C Hamano
@ 2017-06-30 18:57           ` Stefan Beller
  0 siblings, 0 replies; 26+ messages in thread
From: Stefan Beller @ 2017-06-30 18:57 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git@vger.kernel.org, Jonathan Nieder, Jeff Hostetler

On Fri, Jun 30, 2017 at 11:47 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> Ok, let me redo the patch to have fndata at the front.
>>
>> Looking at other places (that have a similar mechanism mechanically,
>> but are semantically different), such as the callback functions for
>> the diff machinery, we have the user provided pointer at the end
>> of the list. But that is because the diff_options are the objects that
>> should be in front row (they are bound to the callback more than
>> some caller needed data).
>
> Quite honestly, I do not care too deeply about an API specific to a
> particular area like "diff" that passes its "configuration" data
> that everybody in the API knows, i.e. diff_options.  If the
> convention for ordinary functions in the API is to pass that in a
> particular location in the parameter list, I would think it is good
> for a application-supplied callback function to follow that pattern.
> After all, it is to configure the behaviour of the "diff" and the
> caller-supplied callback could have been part of a (hypothetically
> richer) API implementation.
>
> But I view a comparison function that is given to hashmap that is
> supplied by the caller a bit differently.  It is not about
> "hashing", so the reason to have the data close to function pointer
> is stronger there.

Yes I agree with that, I forgot to say so.

I just cited the example to see if we have a preference in the project
already and that's about it, but as it is different, we can put the fndata
first.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCHv3 0/3] Introduce data field in hashmap and migrate docs to header
  2017-06-29 23:53 ` [PATCHv2 0/2] Introduce data field in hashmap and migrate docs to header Stefan Beller
  2017-06-29 23:53   ` [PATCHv2 1/2] hashmap.h: compare function has access to a data field Stefan Beller
  2017-06-29 23:53   ` [PATCHv2 2/2] hashmap: migrate documentation from Documentation/technical into header Stefan Beller
@ 2017-06-30 19:14   ` Stefan Beller
  2017-06-30 19:14     ` [PATCHv3 1/3] hashmap.h: compare function has access to a data field Stefan Beller
                       ` (2 more replies)
  2 siblings, 3 replies; 26+ messages in thread
From: Stefan Beller @ 2017-06-30 19:14 UTC (permalink / raw)
  To: sbeller; +Cc: git, git, gitster, jrnieder

v3:

* fixed first patch to call the data 'caller provided' instead of arbitrary.
* moved the position of the caller provided data to first position
* split up the rather mechanical change of function signature with
  fixing the API usage of patch-ids.c

v2:

addressed all but the last point of Jonathan Nieder.

Thanks,
Stefan

v1:

https://public-inbox.org/git/xmqqpodnvmmw.fsf@gitster.mtv.corp.google.com/
for context why we need a new data field.  Implement that.

Once upon a time we had a long discussion where to put documentation best.
The answer was header files as there documentation has less chance
to become stale and be out of date.  Improve the docs by
* migrating them to the header
* clarifying how the compare function is to be used
* how the arguments to hashmap_get/remove should be used.

Thanks,
Stefan

Stefan Beller (3):
  hashmap.h: compare function has access to a data field
  patch-ids.c: use hashmap correctly
  hashmap: migrate documentation from Documentation/technical into
    header

 Documentation/technical/api-hashmap.txt | 309 ---------------------------
 attr.c                                  |   7 +-
 builtin/describe.c                      |   8 +-
 builtin/difftool.c                      |  24 ++-
 builtin/fast-export.c                   |   7 +-
 config.c                                |   9 +-
 convert.c                               |   3 +-
 diffcore-rename.c                       |   2 +-
 hashmap.c                               |  17 +-
 hashmap.h                               | 360 ++++++++++++++++++++++++++++----
 name-hash.c                             |  16 +-
 oidset.c                                |   5 +-
 patch-ids.c                             |  10 +-
 refs.c                                  |   5 +-
 remote.c                                |   7 +-
 sha1_file.c                             |   5 +-
 sub-process.c                           |   7 +-
 sub-process.h                           |   6 +-
 submodule-config.c                      |  14 +-
 t/helper/test-hashmap.c                 |  19 +-
 20 files changed, 431 insertions(+), 409 deletions(-)
 delete mode 100644 Documentation/technical/api-hashmap.txt

-- 
2.13.0.31.g9b732c453e


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCHv3 1/3] hashmap.h: compare function has access to a data field
  2017-06-30 19:14   ` [PATCHv3 0/3] Introduce data field in hashmap and migrate docs to header Stefan Beller
@ 2017-06-30 19:14     ` Stefan Beller
  2017-06-30 19:14     ` [PATCHv3 2/3] patch-ids.c: use hashmap correctly Stefan Beller
  2017-06-30 19:14     ` [PATCHv3 3/3] hashmap: migrate documentation from Documentation/technical into header Stefan Beller
  2 siblings, 0 replies; 26+ messages in thread
From: Stefan Beller @ 2017-06-30 19:14 UTC (permalink / raw)
  To: sbeller; +Cc: git, git, gitster, jrnieder

When using the hashmap a common need is to have access to caller provided
data in the compare function. A couple of times we abuse the keydata field
to pass in the data needed. This happens for example in patch-ids.c.

This patch changes the function signature of the compare function
to have one more void pointer available. The pointer given for each
invocation of the compare function must be defined in the init function
of the hashmap and is just passed through.

Documentation of this new feature is deferred to a later patch.
This is a rather mechanical conversion, just adding the new pass-through
parameter.  However while at it improve the naming of the fields of all
compare functions used by hashmaps by ensuring unused parameters are
prefixed with 'unused_' and naming the parameters what they are (instead
of 'unused' make it 'unused_keydata').

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 attr.c                  |  7 ++++---
 builtin/describe.c      |  8 +++++---
 builtin/difftool.c      | 24 +++++++++++++++---------
 builtin/fast-export.c   |  7 ++++---
 config.c                |  9 ++++++---
 convert.c               |  3 ++-
 diffcore-rename.c       |  2 +-
 hashmap.c               | 17 ++++++++++++-----
 hashmap.h               | 12 ++++++++----
 name-hash.c             | 16 ++++++++++------
 oidset.c                |  5 +++--
 patch-ids.c             |  6 ++++--
 refs.c                  |  5 +++--
 remote.c                |  7 +++++--
 sha1_file.c             |  5 +++--
 sub-process.c           |  7 ++++---
 sub-process.h           |  6 ++++--
 submodule-config.c      | 14 ++++++++------
 t/helper/test-hashmap.c | 19 ++++++++++++-------
 19 files changed, 113 insertions(+), 66 deletions(-)

diff --git a/attr.c b/attr.c
index 37454999d2..56961f0236 100644
--- a/attr.c
+++ b/attr.c
@@ -76,9 +76,10 @@ struct attr_hash_entry {
 };
 
 /* attr_hashmap comparison function */
-static int attr_hash_entry_cmp(const struct attr_hash_entry *a,
+static int attr_hash_entry_cmp(void *unused_cmp_data,
+			       const struct attr_hash_entry *a,
 			       const struct attr_hash_entry *b,
-			       void *unused)
+			       void *unused_keydata)
 {
 	return (a->keylen != b->keylen) || strncmp(a->key, b->key, a->keylen);
 }
@@ -86,7 +87,7 @@ static int attr_hash_entry_cmp(const struct attr_hash_entry *a,
 /* Initialize an 'attr_hashmap' object */
 static void attr_hashmap_init(struct attr_hashmap *map)
 {
-	hashmap_init(&map->map, (hashmap_cmp_fn) attr_hash_entry_cmp, 0);
+	hashmap_init(&map->map, (hashmap_cmp_fn) attr_hash_entry_cmp, NULL, 0);
 }
 
 /*
diff --git a/builtin/describe.c b/builtin/describe.c
index 70eb144608..8868f00ed0 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -54,8 +54,10 @@ static const char *prio_names[] = {
 	N_("head"), N_("lightweight"), N_("annotated"),
 };
 
-static int commit_name_cmp(const struct commit_name *cn1,
-		const struct commit_name *cn2, const void *peeled)
+static int commit_name_cmp(const void *unused_cmp_data,
+			   const struct commit_name *cn1,
+			   const struct commit_name *cn2,
+			   const void *peeled)
 {
 	return oidcmp(&cn1->peeled, peeled ? peeled : &cn2->peeled);
 }
@@ -501,7 +503,7 @@ int cmd_describe(int argc, const char **argv, const char *prefix)
 		return cmd_name_rev(args.argc, args.argv, prefix);
 	}
 
-	hashmap_init(&names, (hashmap_cmp_fn) commit_name_cmp, 0);
+	hashmap_init(&names, (hashmap_cmp_fn) commit_name_cmp, NULL, 0);
 	for_each_rawref(get_name, NULL);
 	if (!names.size && !always)
 		die(_("No names found, cannot describe anything."));
diff --git a/builtin/difftool.c b/builtin/difftool.c
index 9199227f6e..a1a26ba891 100644
--- a/builtin/difftool.c
+++ b/builtin/difftool.c
@@ -130,8 +130,10 @@ struct working_tree_entry {
 	char path[FLEX_ARRAY];
 };
 
-static int working_tree_entry_cmp(struct working_tree_entry *a,
-				  struct working_tree_entry *b, void *keydata)
+static int working_tree_entry_cmp(const void *unused_cmp_data,
+				  struct working_tree_entry *a,
+				  struct working_tree_entry *b,
+				  void *unused_keydata)
 {
 	return strcmp(a->path, b->path);
 }
@@ -146,7 +148,9 @@ struct pair_entry {
 	const char path[FLEX_ARRAY];
 };
 
-static int pair_cmp(struct pair_entry *a, struct pair_entry *b, void *keydata)
+static int pair_cmp(const void *unused_cmp_data,
+		    struct pair_entry *a, struct pair_entry *b,
+		    void *unused_keydata)
 {
 	return strcmp(a->path, b->path);
 }
@@ -174,7 +178,9 @@ struct path_entry {
 	char path[FLEX_ARRAY];
 };
 
-static int path_entry_cmp(struct path_entry *a, struct path_entry *b, void *key)
+static int path_entry_cmp(const void *unused_cmp_data,
+			  struct path_entry *a, struct path_entry *b,
+			  void *key)
 {
 	return strcmp(a->path, key ? key : b->path);
 }
@@ -367,9 +373,9 @@ static int run_dir_diff(const char *extcmd, int symlinks, const char *prefix,
 	wtdir_len = wtdir.len;
 
 	hashmap_init(&working_tree_dups,
-		     (hashmap_cmp_fn)working_tree_entry_cmp, 0);
-	hashmap_init(&submodules, (hashmap_cmp_fn)pair_cmp, 0);
-	hashmap_init(&symlinks2, (hashmap_cmp_fn)pair_cmp, 0);
+		     (hashmap_cmp_fn)working_tree_entry_cmp, NULL, 0);
+	hashmap_init(&submodules, (hashmap_cmp_fn)pair_cmp, NULL, 0);
+	hashmap_init(&symlinks2, (hashmap_cmp_fn)pair_cmp, NULL, 0);
 
 	child.no_stdin = 1;
 	child.git_cmd = 1;
@@ -580,9 +586,9 @@ static int run_dir_diff(const char *extcmd, int symlinks, const char *prefix,
 	 * files through the symlink.
 	 */
 	hashmap_init(&wt_modified, (hashmap_cmp_fn)path_entry_cmp,
-		     wtindex.cache_nr);
+		     NULL, wtindex.cache_nr);
 	hashmap_init(&tmp_modified, (hashmap_cmp_fn)path_entry_cmp,
-		     wtindex.cache_nr);
+		     NULL, wtindex.cache_nr);
 
 	for (i = 0; i < wtindex.cache_nr; i++) {
 		struct hashmap_entry dummy;
diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 12d501bfde..d412c0a8f3 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -93,8 +93,9 @@ struct anonymized_entry {
 	size_t anon_len;
 };
 
-static int anonymized_entry_cmp(const void *va, const void *vb,
-				const void *data)
+static int anonymized_entry_cmp(const void *unused_cmp_data,
+				const void *va, const void *vb,
+				const void *unused_keydata)
 {
 	const struct anonymized_entry *a = va, *b = vb;
 	return a->orig_len != b->orig_len ||
@@ -113,7 +114,7 @@ static const void *anonymize_mem(struct hashmap *map,
 	struct anonymized_entry key, *ret;
 
 	if (!map->cmpfn)
-		hashmap_init(map, anonymized_entry_cmp, 0);
+		hashmap_init(map, anonymized_entry_cmp, NULL, 0);
 
 	hashmap_entry_init(&key, memhash(orig, *len));
 	key.orig = orig;
diff --git a/config.c b/config.c
index 1cd40a5fe6..4a31e31ac3 100644
--- a/config.c
+++ b/config.c
@@ -1753,15 +1753,18 @@ static int configset_add_value(struct config_set *cs, const char *key, const cha
 	return 0;
 }
 
-static int config_set_element_cmp(const struct config_set_element *e1,
-				 const struct config_set_element *e2, const void *unused)
+static int config_set_element_cmp(const void *unused_cmp_data,
+				  const struct config_set_element *e1,
+				  const struct config_set_element *e2,
+				  const void *unused_keydata)
 {
 	return strcmp(e1->key, e2->key);
 }
 
 void git_configset_init(struct config_set *cs)
 {
-	hashmap_init(&cs->config_hash, (hashmap_cmp_fn)config_set_element_cmp, 0);
+	hashmap_init(&cs->config_hash, (hashmap_cmp_fn)config_set_element_cmp,
+		     NULL, 0);
 	cs->hash_initialized = 1;
 	cs->list.nr = 0;
 	cs->list.alloc = 0;
diff --git a/convert.c b/convert.c
index 7d2a519daf..deaf0ba7b3 100644
--- a/convert.c
+++ b/convert.c
@@ -583,7 +583,8 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len
 
 	if (!subprocess_map_initialized) {
 		subprocess_map_initialized = 1;
-		hashmap_init(&subprocess_map, (hashmap_cmp_fn) cmd2process_cmp, 0);
+		hashmap_init(&subprocess_map, (hashmap_cmp_fn) cmd2process_cmp,
+			     NULL, 0);
 		entry = NULL;
 	} else {
 		entry = (struct cmd2process *)subprocess_find_entry(&subprocess_map, cmd);
diff --git a/diffcore-rename.c b/diffcore-rename.c
index 1e4678b7d7..786f389498 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -341,7 +341,7 @@ static int find_exact_renames(struct diff_options *options)
 	/* Add all sources to the hash table in reverse order, because
 	 * later on they will be retrieved in LIFO order.
 	 */
-	hashmap_init(&file_table, NULL, rename_src_nr);
+	hashmap_init(&file_table, NULL, NULL, rename_src_nr);
 	for (i = rename_src_nr-1; i >= 0; i--)
 		insert_file_table(&file_table, i, rename_src[i].p->one);
 
diff --git a/hashmap.c b/hashmap.c
index 7d1044eb5d..9b6a12859b 100644
--- a/hashmap.c
+++ b/hashmap.c
@@ -95,7 +95,9 @@ static inline int entry_equals(const struct hashmap *map,
 		const struct hashmap_entry *e1, const struct hashmap_entry *e2,
 		const void *keydata)
 {
-	return (e1 == e2) || (e1->hash == e2->hash && !map->cmpfn(e1, e2, keydata));
+	return (e1 == e2) ||
+	       (e1->hash == e2->hash &&
+		!map->cmpfn(map->cmpfn_data, e1, e2, keydata));
 }
 
 static inline unsigned int bucket(const struct hashmap *map,
@@ -140,19 +142,23 @@ static inline struct hashmap_entry **find_entry_ptr(const struct hashmap *map,
 	return e;
 }
 
-static int always_equal(const void *unused1, const void *unused2, const void *unused3)
+static int always_equal(const void *unused_cmp_data,
+			const void *unused1,
+			const void *unused2,
+			const void *unused_keydata)
 {
 	return 0;
 }
 
 void hashmap_init(struct hashmap *map, hashmap_cmp_fn equals_function,
-		size_t initial_size)
+		const void *cmpfn_data, size_t initial_size)
 {
 	unsigned int size = HASHMAP_INITIAL_SIZE;
 
 	memset(map, 0, sizeof(*map));
 
 	map->cmpfn = equals_function ? equals_function : always_equal;
+	map->cmpfn_data = cmpfn_data;
 
 	/* calculate initial table size and allocate the table */
 	initial_size = (unsigned int) ((uint64_t) initial_size * 100
@@ -260,7 +266,8 @@ struct pool_entry {
 	unsigned char data[FLEX_ARRAY];
 };
 
-static int pool_entry_cmp(const struct pool_entry *e1,
+static int pool_entry_cmp(const void *unused_cmp_data,
+			  const struct pool_entry *e1,
 			  const struct pool_entry *e2,
 			  const unsigned char *keydata)
 {
@@ -275,7 +282,7 @@ const void *memintern(const void *data, size_t len)
 
 	/* initialize string pool hashmap */
 	if (!map.tablesize)
-		hashmap_init(&map, (hashmap_cmp_fn) pool_entry_cmp, 0);
+		hashmap_init(&map, (hashmap_cmp_fn) pool_entry_cmp, NULL, 0);
 
 	/* lookup interned string in pool */
 	hashmap_entry_init(&key, memhash(data, len));
diff --git a/hashmap.h b/hashmap.h
index de6022a3a9..aaf09e047e 100644
--- a/hashmap.h
+++ b/hashmap.h
@@ -32,12 +32,14 @@ struct hashmap_entry {
 	unsigned int hash;
 };
 
-typedef int (*hashmap_cmp_fn)(const void *entry, const void *entry_or_key,
-		const void *keydata);
+typedef int (*hashmap_cmp_fn)(const void *hashmap_cmp_fn_data,
+			      const void *entry, const void *entry_or_key,
+			      const void *keydata);
 
 struct hashmap {
 	struct hashmap_entry **table;
 	hashmap_cmp_fn cmpfn;
+	const void *cmpfn_data;
 	unsigned int size, tablesize, grow_at, shrink_at;
 	unsigned disallow_rehash : 1;
 };
@@ -50,8 +52,10 @@ struct hashmap_iter {
 
 /* hashmap functions */
 
-extern void hashmap_init(struct hashmap *map, hashmap_cmp_fn equals_function,
-		size_t initial_size);
+extern void hashmap_init(struct hashmap *map,
+			 hashmap_cmp_fn equals_function,
+			 const void *equals_function_data,
+			 size_t initial_size);
 extern void hashmap_free(struct hashmap *map, int free_entries);
 
 /* hashmap_entry functions */
diff --git a/name-hash.c b/name-hash.c
index 39309efb7f..0e10f3eab8 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -16,8 +16,10 @@ struct dir_entry {
 	char name[FLEX_ARRAY];
 };
 
-static int dir_entry_cmp(const struct dir_entry *e1,
-		const struct dir_entry *e2, const char *name)
+static int dir_entry_cmp(const void *unused_cmp_data,
+			 const struct dir_entry *e1,
+			 const struct dir_entry *e2,
+			 const char *name)
 {
 	return e1->namelen != e2->namelen || strncasecmp(e1->name,
 			name ? name : e2->name, e1->namelen);
@@ -107,8 +109,10 @@ static void hash_index_entry(struct index_state *istate, struct cache_entry *ce)
 		add_dir_entry(istate, ce);
 }
 
-static int cache_entry_cmp(const struct cache_entry *ce1,
-		const struct cache_entry *ce2, const void *remove)
+static int cache_entry_cmp(const void *unused_cmp_data,
+			   const struct cache_entry *ce1,
+			   const struct cache_entry *ce2,
+			   const void *remove)
 {
 	/*
 	 * For remove_name_hash, find the exact entry (pointer equality); for
@@ -571,9 +575,9 @@ static void lazy_init_name_hash(struct index_state *istate)
 	if (istate->name_hash_initialized)
 		return;
 	hashmap_init(&istate->name_hash, (hashmap_cmp_fn) cache_entry_cmp,
-			istate->cache_nr);
+			NULL, istate->cache_nr);
 	hashmap_init(&istate->dir_hash, (hashmap_cmp_fn) dir_entry_cmp,
-			istate->cache_nr);
+			NULL, istate->cache_nr);
 
 	if (lookup_lazy_params(istate)) {
 		hashmap_disallow_rehash(&istate->dir_hash, 1);
diff --git a/oidset.c b/oidset.c
index ac169f05d3..a6a08ba52a 100644
--- a/oidset.c
+++ b/oidset.c
@@ -6,7 +6,8 @@ struct oidset_entry {
 	struct object_id oid;
 };
 
-static int oidset_hashcmp(const void *va, const void *vb,
+static int oidset_hashcmp(const void *unused_cmp_data,
+			  const void *va, const void *vb,
 			  const void *vkey)
 {
 	const struct oidset_entry *a = va, *b = vb;
@@ -30,7 +31,7 @@ int oidset_insert(struct oidset *set, const struct object_id *oid)
 	struct oidset_entry *entry;
 
 	if (!set->map.cmpfn)
-		hashmap_init(&set->map, oidset_hashcmp, 0);
+		hashmap_init(&set->map, oidset_hashcmp, NULL, 0);
 
 	if (oidset_contains(set, oid))
 		return 1;
diff --git a/patch-ids.c b/patch-ids.c
index 9c0ab9e67a..815c115811 100644
--- a/patch-ids.c
+++ b/patch-ids.c
@@ -35,7 +35,8 @@ int commit_patch_id(struct commit *commit, struct diff_options *options,
  * the side of safety.  The actual value being negative does not have
  * any significance; only that it is non-zero matters.
  */
-static int patch_id_cmp(struct patch_id *a,
+static int patch_id_cmp(const void *unused_cmp_data,
+			struct patch_id *a,
 			struct patch_id *b,
 			struct diff_options *opt)
 {
@@ -57,7 +58,8 @@ int init_patch_ids(struct patch_ids *ids)
 	ids->diffopts.detect_rename = 0;
 	DIFF_OPT_SET(&ids->diffopts, RECURSIVE);
 	diff_setup_done(&ids->diffopts);
-	hashmap_init(&ids->patches, (hashmap_cmp_fn)patch_id_cmp, 256);
+	hashmap_init(&ids->patches, (hashmap_cmp_fn)patch_id_cmp,
+		     NULL, 256);
 	return 0;
 }
 
diff --git a/refs.c b/refs.c
index 88658ba769..eef1a7b67d 100644
--- a/refs.c
+++ b/refs.c
@@ -1525,7 +1525,8 @@ struct ref_store_hash_entry
 	char name[FLEX_ARRAY];
 };
 
-static int ref_store_hash_cmp(const void *entry, const void *entry_or_key,
+static int ref_store_hash_cmp(const void *unused_cmp_data,
+			      const void *entry, const void *entry_or_key,
 			      const void *keydata)
 {
 	const struct ref_store_hash_entry *e1 = entry, *e2 = entry_or_key;
@@ -1608,7 +1609,7 @@ static void register_ref_store_map(struct hashmap *map,
 				   const char *name)
 {
 	if (!map->tablesize)
-		hashmap_init(map, ref_store_hash_cmp, 0);
+		hashmap_init(map, ref_store_hash_cmp, NULL, 0);
 
 	if (hashmap_put(map, alloc_ref_store_hash_entry(name, refs)))
 		die("BUG: %s ref_store '%s' initialized twice", type, name);
diff --git a/remote.c b/remote.c
index d87482573d..60d0043921 100644
--- a/remote.c
+++ b/remote.c
@@ -133,7 +133,10 @@ struct remotes_hash_key {
 	int len;
 };
 
-static int remotes_hash_cmp(const struct remote *a, const struct remote *b, const struct remotes_hash_key *key)
+static int remotes_hash_cmp(const void *unused_cmp_data,
+			    const struct remote *a,
+			    const struct remote *b,
+			    const struct remotes_hash_key *key)
 {
 	if (key)
 		return strncmp(a->name, key->str, key->len) || a->name[key->len];
@@ -144,7 +147,7 @@ static int remotes_hash_cmp(const struct remote *a, const struct remote *b, cons
 static inline void init_remotes_hash(void)
 {
 	if (!remotes_hash.cmpfn)
-		hashmap_init(&remotes_hash, (hashmap_cmp_fn)remotes_hash_cmp, 0);
+		hashmap_init(&remotes_hash, (hashmap_cmp_fn)remotes_hash_cmp, NULL, 0);
 }
 
 static struct remote *make_remote(const char *name, int len)
diff --git a/sha1_file.c b/sha1_file.c
index fb1fd809dc..8c523a5b74 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2389,7 +2389,8 @@ static int delta_base_cache_key_eq(const struct delta_base_cache_key *a,
 	return a->p == b->p && a->base_offset == b->base_offset;
 }
 
-static int delta_base_cache_hash_cmp(const void *va, const void *vb,
+static int delta_base_cache_hash_cmp(const void *unused_cmp_data,
+				     const void *va, const void *vb,
 				     const void *vkey)
 {
 	const struct delta_base_cache_entry *a = va, *b = vb;
@@ -2472,7 +2473,7 @@ static void add_delta_base_cache(struct packed_git *p, off_t base_offset,
 	list_add_tail(&ent->lru, &delta_base_cache_lru);
 
 	if (!delta_base_cache.cmpfn)
-		hashmap_init(&delta_base_cache, delta_base_cache_hash_cmp, 0);
+		hashmap_init(&delta_base_cache, delta_base_cache_hash_cmp, NULL, 0);
 	hashmap_entry_init(ent, pack_entry_hash(p, base_offset));
 	hashmap_add(&delta_base_cache, ent);
 }
diff --git a/sub-process.c b/sub-process.c
index 92f8aea70a..a3cfab1a9d 100644
--- a/sub-process.c
+++ b/sub-process.c
@@ -5,9 +5,10 @@
 #include "sigchain.h"
 #include "pkt-line.h"
 
-int cmd2process_cmp(const struct subprocess_entry *e1,
-			   const struct subprocess_entry *e2,
-			   const void *unused)
+int cmd2process_cmp(const void *unused_cmp_data,
+		    const struct subprocess_entry *e1,
+		    const struct subprocess_entry *e2,
+		    const void *unused_keydata)
 {
 	return strcmp(e1->cmd, e2->cmd);
 }
diff --git a/sub-process.h b/sub-process.h
index d9a45cd359..96a2cca360 100644
--- a/sub-process.h
+++ b/sub-process.h
@@ -20,8 +20,10 @@ struct subprocess_entry {
 
 /* subprocess functions */
 
-int cmd2process_cmp(const struct subprocess_entry *e1,
-	const struct subprocess_entry *e2, const void *unused);
+extern int cmd2process_cmp(const void *unused_cmp_data,
+			   const struct subprocess_entry *e1,
+			   const struct subprocess_entry *e2,
+			   const void *unused_keydata);
 
 typedef int(*subprocess_start_fn)(struct subprocess_entry *entry);
 int subprocess_start(struct hashmap *hashmap, struct subprocess_entry *entry, const char *cmd,
diff --git a/submodule-config.c b/submodule-config.c
index d8f8d5ea32..0e1126183d 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -34,17 +34,19 @@ enum lookup_type {
 static struct submodule_cache the_submodule_cache;
 static int is_cache_init;
 
-static int config_path_cmp(const struct submodule_entry *a,
+static int config_path_cmp(const void *unused_cmp_data,
+			   const struct submodule_entry *a,
 			   const struct submodule_entry *b,
-			   const void *unused)
+			   const void *unused_keydata)
 {
 	return strcmp(a->config->path, b->config->path) ||
 	       hashcmp(a->config->gitmodules_sha1, b->config->gitmodules_sha1);
 }
 
-static int config_name_cmp(const struct submodule_entry *a,
+static int config_name_cmp(const void *unused_cmp_data,
+			   const struct submodule_entry *a,
 			   const struct submodule_entry *b,
-			   const void *unused)
+			   const void *unused_keydata)
 {
 	return strcmp(a->config->name, b->config->name) ||
 	       hashcmp(a->config->gitmodules_sha1, b->config->gitmodules_sha1);
@@ -52,8 +54,8 @@ static int config_name_cmp(const struct submodule_entry *a,
 
 static void cache_init(struct submodule_cache *cache)
 {
-	hashmap_init(&cache->for_path, (hashmap_cmp_fn) config_path_cmp, 0);
-	hashmap_init(&cache->for_name, (hashmap_cmp_fn) config_name_cmp, 0);
+	hashmap_init(&cache->for_path, (hashmap_cmp_fn) config_path_cmp, NULL, 0);
+	hashmap_init(&cache->for_name, (hashmap_cmp_fn) config_name_cmp, NULL, 0);
 }
 
 static void free_one_config(struct submodule_entry *entry)
diff --git a/t/helper/test-hashmap.c b/t/helper/test-hashmap.c
index 7aa9440e27..095d7395f3 100644
--- a/t/helper/test-hashmap.c
+++ b/t/helper/test-hashmap.c
@@ -13,14 +13,18 @@ static const char *get_value(const struct test_entry *e)
 	return e->key + strlen(e->key) + 1;
 }
 
-static int test_entry_cmp(const struct test_entry *e1,
-		const struct test_entry *e2, const char* key)
+static int test_entry_cmp(const void *unused_cmp_data,
+			  const struct test_entry *e1,
+			  const struct test_entry *e2,
+			  const char* key)
 {
 	return strcmp(e1->key, key ? key : e2->key);
 }
 
-static int test_entry_cmp_icase(const struct test_entry *e1,
-		const struct test_entry *e2, const char* key)
+static int test_entry_cmp_icase(const void *unused_cmp_data,
+				const struct test_entry *e1,
+				const struct test_entry *e2,
+				const char* key)
 {
 	return strcasecmp(e1->key, key ? key : e2->key);
 }
@@ -92,7 +96,8 @@ static void perf_hashmap(unsigned int method, unsigned int rounds)
 	if (method & TEST_ADD) {
 		/* test adding to the map */
 		for (j = 0; j < rounds; j++) {
-			hashmap_init(&map, (hashmap_cmp_fn) test_entry_cmp, 0);
+			hashmap_init(&map, (hashmap_cmp_fn) test_entry_cmp,
+				     NULL, 0);
 
 			/* add entries */
 			for (i = 0; i < TEST_SIZE; i++) {
@@ -104,7 +109,7 @@ static void perf_hashmap(unsigned int method, unsigned int rounds)
 		}
 	} else {
 		/* test map lookups */
-		hashmap_init(&map, (hashmap_cmp_fn) test_entry_cmp, 0);
+		hashmap_init(&map, (hashmap_cmp_fn) test_entry_cmp, NULL, 0);
 
 		/* fill the map (sparsely if specified) */
 		j = (method & TEST_SPARSE) ? TEST_SIZE / 10 : TEST_SIZE;
@@ -147,7 +152,7 @@ int cmd_main(int argc, const char **argv)
 	/* init hash map */
 	icase = argc > 1 && !strcmp("ignorecase", argv[1]);
 	hashmap_init(&map, (hashmap_cmp_fn) (icase ? test_entry_cmp_icase
-			: test_entry_cmp), 0);
+			: test_entry_cmp), NULL, 0);
 
 	/* process commands from stdin */
 	while (fgets(line, sizeof(line), stdin)) {
-- 
2.13.0.31.g9b732c453e


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCHv3 2/3] patch-ids.c: use hashmap correctly
  2017-06-30 19:14   ` [PATCHv3 0/3] Introduce data field in hashmap and migrate docs to header Stefan Beller
  2017-06-30 19:14     ` [PATCHv3 1/3] hashmap.h: compare function has access to a data field Stefan Beller
@ 2017-06-30 19:14     ` Stefan Beller
  2017-06-30 19:50       ` Junio C Hamano
  2017-07-05  9:00       ` Jeff King
  2017-06-30 19:14     ` [PATCHv3 3/3] hashmap: migrate documentation from Documentation/technical into header Stefan Beller
  2 siblings, 2 replies; 26+ messages in thread
From: Stefan Beller @ 2017-06-30 19:14 UTC (permalink / raw)
  To: sbeller; +Cc: git, git, gitster, jrnieder

As eluded to in the previous patch, the code in patch-ids.c is using
the hashmaps API wrong.

Luckily we do not have a bug, as all hashmap functionality that we use
here (hashmap_get) passes through the keydata.  If hashmap_get_next were
to be used, a bug would occur as that passes NULL for the key_data.

So instead use the hashmap API correctly and provide the caller required
data in the compare function via the first argument that always gets
passed and was setup via the hashmap_init function.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 patch-ids.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/patch-ids.c b/patch-ids.c
index 815c115811..b4166b0f38 100644
--- a/patch-ids.c
+++ b/patch-ids.c
@@ -35,10 +35,10 @@ int commit_patch_id(struct commit *commit, struct diff_options *options,
  * the side of safety.  The actual value being negative does not have
  * any significance; only that it is non-zero matters.
  */
-static int patch_id_cmp(const void *unused_cmp_data,
+static int patch_id_cmp(struct diff_options *opt,
 			struct patch_id *a,
 			struct patch_id *b,
-			struct diff_options *opt)
+			const void *unused_keydata)
 {
 	if (is_null_oid(&a->patch_id) &&
 	    commit_patch_id(a->commit, opt, &a->patch_id, 0))
@@ -59,7 +59,7 @@ int init_patch_ids(struct patch_ids *ids)
 	DIFF_OPT_SET(&ids->diffopts, RECURSIVE);
 	diff_setup_done(&ids->diffopts);
 	hashmap_init(&ids->patches, (hashmap_cmp_fn)patch_id_cmp,
-		     NULL, 256);
+		     &ids->diffopts, 256);
 	return 0;
 }
 
@@ -95,7 +95,7 @@ struct patch_id *has_commit_patch_id(struct commit *commit,
 	if (init_patch_id_entry(&patch, commit, ids))
 		return NULL;
 
-	return hashmap_get(&ids->patches, &patch, &ids->diffopts);
+	return hashmap_get(&ids->patches, &patch, NULL);
 }
 
 struct patch_id *add_commit_patch_id(struct commit *commit,
-- 
2.13.0.31.g9b732c453e


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCHv3 3/3] hashmap: migrate documentation from Documentation/technical into header
  2017-06-30 19:14   ` [PATCHv3 0/3] Introduce data field in hashmap and migrate docs to header Stefan Beller
  2017-06-30 19:14     ` [PATCHv3 1/3] hashmap.h: compare function has access to a data field Stefan Beller
  2017-06-30 19:14     ` [PATCHv3 2/3] patch-ids.c: use hashmap correctly Stefan Beller
@ 2017-06-30 19:14     ` Stefan Beller
  2 siblings, 0 replies; 26+ messages in thread
From: Stefan Beller @ 2017-06-30 19:14 UTC (permalink / raw)
  To: sbeller; +Cc: git, git, gitster, jrnieder

While at it, clarify the use of `key`, `keydata`, `entry_or_key` as well
as documenting the new data pointer for the compare function.

Rework the example.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 Documentation/technical/api-hashmap.txt | 309 ----------------------------
 hashmap.h                               | 348 +++++++++++++++++++++++++++++---
 2 files changed, 316 insertions(+), 341 deletions(-)
 delete mode 100644 Documentation/technical/api-hashmap.txt

diff --git a/Documentation/technical/api-hashmap.txt b/Documentation/technical/api-hashmap.txt
deleted file mode 100644
index ccc634bbd7..0000000000
--- a/Documentation/technical/api-hashmap.txt
+++ /dev/null
@@ -1,309 +0,0 @@
-hashmap API
-===========
-
-The hashmap API is a generic implementation of hash-based key-value mappings.
-
-Data Structures
----------------
-
-`struct hashmap`::
-
-	The hash table structure. Members can be used as follows, but should
-	not be modified directly:
-+
-The `size` member keeps track of the total number of entries (0 means the
-hashmap is empty).
-+
-`tablesize` is the allocated size of the hash table. A non-0 value indicates
-that the hashmap is initialized. It may also be useful for statistical purposes
-(i.e. `size / tablesize` is the current load factor).
-+
-`cmpfn` stores the comparison function specified in `hashmap_init()`. In
-advanced scenarios, it may be useful to change this, e.g. to switch between
-case-sensitive and case-insensitive lookup.
-+
-When `disallow_rehash` is set, automatic rehashes are prevented during inserts
-and deletes.
-
-`struct hashmap_entry`::
-
-	An opaque structure representing an entry in the hash table, which must
-	be used as first member of user data structures. Ideally it should be
-	followed by an int-sized member to prevent unused memory on 64-bit
-	systems due to alignment.
-+
-The `hash` member is the entry's hash code and the `next` member points to the
-next entry in case of collisions (i.e. if multiple entries map to the same
-bucket).
-
-`struct hashmap_iter`::
-
-	An iterator structure, to be used with hashmap_iter_* functions.
-
-Types
------
-
-`int (*hashmap_cmp_fn)(const void *entry, const void *entry_or_key, const void *keydata)`::
-
-	User-supplied function to test two hashmap entries for equality. Shall
-	return 0 if the entries are equal.
-+
-This function is always called with non-NULL `entry` / `entry_or_key`
-parameters that have the same hash code. When looking up an entry, the `key`
-and `keydata` parameters to hashmap_get and hashmap_remove are always passed
-as second and third argument, respectively. Otherwise, `keydata` is NULL.
-
-Functions
----------
-
-`unsigned int strhash(const char *buf)`::
-`unsigned int strihash(const char *buf)`::
-`unsigned int memhash(const void *buf, size_t len)`::
-`unsigned int memihash(const void *buf, size_t len)`::
-`unsigned int memihash_cont(unsigned int hash_seed, const void *buf, size_t len)`::
-
-	Ready-to-use hash functions for strings, using the FNV-1 algorithm (see
-	http://www.isthe.com/chongo/tech/comp/fnv).
-+
-`strhash` and `strihash` take 0-terminated strings, while `memhash` and
-`memihash` operate on arbitrary-length memory.
-+
-`strihash` and `memihash` are case insensitive versions.
-+
-`memihash_cont` is a variant of `memihash` that allows a computation to be
-continued with another chunk of data.
-
-`unsigned int sha1hash(const unsigned char *sha1)`::
-
-	Converts a cryptographic hash (e.g. SHA-1) into an int-sized hash code
-	for use in hash tables. Cryptographic hashes are supposed to have
-	uniform distribution, so in contrast to `memhash()`, this just copies
-	the first `sizeof(int)` bytes without shuffling any bits. Note that
-	the results will be different on big-endian and little-endian
-	platforms, so they should not be stored or transferred over the net.
-
-`void hashmap_init(struct hashmap *map, hashmap_cmp_fn equals_function, size_t initial_size)`::
-
-	Initializes a hashmap structure.
-+
-`map` is the hashmap to initialize.
-+
-The `equals_function` can be specified to compare two entries for equality.
-If NULL, entries are considered equal if their hash codes are equal.
-+
-If the total number of entries is known in advance, the `initial_size`
-parameter may be used to preallocate a sufficiently large table and thus
-prevent expensive resizing. If 0, the table is dynamically resized.
-
-`void hashmap_free(struct hashmap *map, int free_entries)`::
-
-	Frees a hashmap structure and allocated memory.
-+
-`map` is the hashmap to free.
-+
-If `free_entries` is true, each hashmap_entry in the map is freed as well
-(using stdlib's free()).
-
-`void hashmap_entry_init(void *entry, unsigned int hash)`::
-
-	Initializes a hashmap_entry structure.
-+
-`entry` points to the entry to initialize.
-+
-`hash` is the hash code of the entry.
-+
-The hashmap_entry structure does not hold references to external resources,
-and it is safe to just discard it once you are done with it (i.e. if
-your structure was allocated with xmalloc(), you can just free(3) it,
-and if it is on stack, you can just let it go out of scope).
-
-`void *hashmap_get(const struct hashmap *map, const void *key, const void *keydata)`::
-
-	Returns the hashmap entry for the specified key, or NULL if not found.
-+
-`map` is the hashmap structure.
-+
-`key` is a hashmap_entry structure (or user data structure that starts with
-hashmap_entry) that has at least been initialized with the proper hash code
-(via `hashmap_entry_init`).
-+
-If an entry with matching hash code is found, `key` and `keydata` are passed
-to `hashmap_cmp_fn` to decide whether the entry matches the key.
-
-`void *hashmap_get_from_hash(const struct hashmap *map, unsigned int hash, const void *keydata)`::
-
-	Returns the hashmap entry for the specified hash code and key data,
-	or NULL if not found.
-+
-`map` is the hashmap structure.
-+
-`hash` is the hash code of the entry to look up.
-+
-If an entry with matching hash code is found, `keydata` is passed to
-`hashmap_cmp_fn` to decide whether the entry matches the key. The
-`entry_or_key` parameter points to a bogus hashmap_entry structure that
-should not be used in the comparison.
-
-`void *hashmap_get_next(const struct hashmap *map, const void *entry)`::
-
-	Returns the next equal hashmap entry, or NULL if not found. This can be
-	used to iterate over duplicate entries (see `hashmap_add`).
-+
-`map` is the hashmap structure.
-+
-`entry` is the hashmap_entry to start the search from, obtained via a previous
-call to `hashmap_get` or `hashmap_get_next`.
-
-`void hashmap_add(struct hashmap *map, void *entry)`::
-
-	Adds a hashmap entry. This allows to add duplicate entries (i.e.
-	separate values with the same key according to hashmap_cmp_fn).
-+
-`map` is the hashmap structure.
-+
-`entry` is the entry to add.
-
-`void *hashmap_put(struct hashmap *map, void *entry)`::
-
-	Adds or replaces a hashmap entry. If the hashmap contains duplicate
-	entries equal to the specified entry, only one of them will be replaced.
-+
-`map` is the hashmap structure.
-+
-`entry` is the entry to add or replace.
-+
-Returns the replaced entry, or NULL if not found (i.e. the entry was added).
-
-`void *hashmap_remove(struct hashmap *map, const void *key, const void *keydata)`::
-
-	Removes a hashmap entry matching the specified key. If the hashmap
-	contains duplicate entries equal to the specified key, only one of
-	them will be removed.
-+
-`map` is the hashmap structure.
-+
-`key` is a hashmap_entry structure (or user data structure that starts with
-hashmap_entry) that has at least been initialized with the proper hash code
-(via `hashmap_entry_init`).
-+
-If an entry with matching hash code is found, `key` and `keydata` are
-passed to `hashmap_cmp_fn` to decide whether the entry matches the key.
-+
-Returns the removed entry, or NULL if not found.
-
-`void hashmap_disallow_rehash(struct hashmap *map, unsigned value)`::
-
-	Disallow/allow automatic rehashing of the hashmap during inserts
-	and deletes.
-+
-This is useful if the caller knows that the hashmap will be accessed
-by multiple threads.
-+
-The caller is still responsible for any necessary locking; this simply
-prevents unexpected rehashing.  The caller is also responsible for properly
-sizing the initial hashmap to ensure good performance.
-+
-A call to allow rehashing does not force a rehash; that might happen
-with the next insert or delete.
-
-`void hashmap_iter_init(struct hashmap *map, struct hashmap_iter *iter)`::
-`void *hashmap_iter_next(struct hashmap_iter *iter)`::
-`void *hashmap_iter_first(struct hashmap *map, struct hashmap_iter *iter)`::
-
-	Used to iterate over all entries of a hashmap. Note that it is
-	not safe to add or remove entries to the hashmap while
-	iterating.
-+
-`hashmap_iter_init` initializes a `hashmap_iter` structure.
-+
-`hashmap_iter_next` returns the next hashmap_entry, or NULL if there are no
-more entries.
-+
-`hashmap_iter_first` is a combination of both (i.e. initializes the iterator
-and returns the first entry, if any).
-
-`const char *strintern(const char *string)`::
-`const void *memintern(const void *data, size_t len)`::
-
-	Returns the unique, interned version of the specified string or data,
-	similar to the `String.intern` API in Java and .NET, respectively.
-	Interned strings remain valid for the entire lifetime of the process.
-+
-Can be used as `[x]strdup()` or `xmemdupz` replacement, except that interned
-strings / data must not be modified or freed.
-+
-Interned strings are best used for short strings with high probability of
-duplicates.
-+
-Uses a hashmap to store the pool of interned strings.
-
-Usage example
--------------
-
-Here's a simple usage example that maps long keys to double values.
-------------
-struct hashmap map;
-
-struct long2double {
-	struct hashmap_entry ent; /* must be the first member! */
-	long key;
-	double value;
-};
-
-static int long2double_cmp(const struct long2double *e1, const struct long2double *e2, const void *unused)
-{
-	return !(e1->key == e2->key);
-}
-
-void long2double_init(void)
-{
-	hashmap_init(&map, (hashmap_cmp_fn) long2double_cmp, 0);
-}
-
-void long2double_free(void)
-{
-	hashmap_free(&map, 1);
-}
-
-static struct long2double *find_entry(long key)
-{
-	struct long2double k;
-	hashmap_entry_init(&k, memhash(&key, sizeof(long)));
-	k.key = key;
-	return hashmap_get(&map, &k, NULL);
-}
-
-double get_value(long key)
-{
-	struct long2double *e = find_entry(key);
-	return e ? e->value : 0;
-}
-
-void set_value(long key, double value)
-{
-	struct long2double *e = find_entry(key);
-	if (!e) {
-		e = malloc(sizeof(struct long2double));
-		hashmap_entry_init(e, memhash(&key, sizeof(long)));
-		e->key = key;
-		hashmap_add(&map, e);
-	}
-	e->value = value;
-}
-------------
-
-Using variable-sized keys
--------------------------
-
-The `hashmap_entry_get` and `hashmap_entry_remove` functions expect an ordinary
-`hashmap_entry` structure as key to find the correct entry. If the key data is
-variable-sized (e.g. a FLEX_ARRAY string) or quite large, it is undesirable
-to create a full-fledged entry structure on the heap and copy all the key data
-into the structure.
-
-In this case, the `keydata` parameter can be used to pass
-variable-sized key data directly to the comparison function, and the `key`
-parameter can be a stripped-down, fixed size entry structure allocated on the
-stack.
-
-See test-hashmap.c for an example using arbitrary-length strings as keys.
diff --git a/hashmap.h b/hashmap.h
index aaf09e047e..7a8fa7fa3d 100644
--- a/hashmap.h
+++ b/hashmap.h
@@ -3,17 +3,123 @@
 
 /*
  * Generic implementation of hash-based key-value mappings.
- * See Documentation/technical/api-hashmap.txt.
+ *
+ * An example that maps long to a string:
+ * For the sake of the example this allows to lookup exact values, too
+ * (i.e. it is operated as a set, the value is part of the key)
+ * -------------------------------------
+ *
+ * struct hashmap map;
+ * struct long2string {
+ *     struct hashmap_entry ent; // must be the first member!
+ *     long key;
+ *     char value[FLEX_ARRAY];   // be careful with allocating on stack!
+ * };
+ *
+ * #define COMPARE_VALUE 1
+ *
+ * static int long2string_cmp(const struct long2string *e1,
+ *                            const struct long2string *e2,
+ *                            const void *keydata, const void *userdata)
+ * {
+ *     char *string = keydata;
+ *     unsigned *flags = (unsigned*)userdata;
+ *
+ *     if (flags & COMPARE_VALUE)
+ *         return !(e1->key == e2->key) || (keydata ?
+ *                  strcmp(e1->value, keydata) : strcmp(e1->value, e2->value));
+ *     else
+ *         return !(e1->key == e2->key);
+ * }
+ *
+ * int main(int argc, char **argv)
+ * {
+ *     long key;
+ *     char *value, *action;
+ *
+ *     unsigned flags = ALLOW_DUPLICATE_KEYS;
+ *
+ *     hashmap_init(&map, (hashmap_cmp_fn) long2string_cmp, &flags, 0);
+ *
+ *     while (scanf("%s %l %s", action, key, value)) {
+ *
+ *         if (!strcmp("add", action)) {
+ *             struct long2string *e;
+ *             e = malloc(sizeof(struct long2string) + strlen(value));
+ *             hashmap_entry_init(e, memhash(&key, sizeof(long)));
+ *             e->key = key;
+ *             memcpy(e->value, value, strlen(value));
+ *             hashmap_add(&map, e);
+ *         }
+ *
+ *         if (!strcmp("print_all_by_key", action)) {
+ *             flags &= ~COMPARE_VALUE;
+ *
+ *             struct long2string k;
+ *             hashmap_entry_init(&k, memhash(&key, sizeof(long)));
+ *             k.key = key;
+ *
+ *             struct long2string *e = hashmap_get(&map, &k, NULL);
+ *             if (e) {
+ *                 printf("first: %l %s\n", e->key, e->value);
+ *                 while (e = hashmap_get_next(&map, e))
+ *                     printf("found more: %l %s\n", e->key, e->value);
+ *             }
+ *         }
+ *
+ *         if (!strcmp("has_exact_match", action)) {
+ *             flags |= COMPARE_VALUE;
+ *
+ *             struct long2string *e;
+ *             e = malloc(sizeof(struct long2string) + strlen(value));
+ *             hashmap_entry_init(e, memhash(&key, sizeof(long)));
+ *             e->key = key;
+ *             memcpy(e->value, value, strlen(value));
+ *
+ *             printf("%s found\n", hashmap_get(&map, e, NULL) ? "" : "not");
+ *         }
+ *
+ *         if (!strcmp("has_exact_match_no_heap_alloc", action)) {
+ *             flags |= COMPARE_VALUE;
+ *
+ *             struct long2string e;
+ *             hashmap_entry_init(e, memhash(&key, sizeof(long)));
+ *             e.key = key;
+ *
+ *             printf("%s found\n", hashmap_get(&map, e, value) ? "" : "not");
+ *         }
+ *
+ *         if (!strcmp("end", action)) {
+ *             hashmap_free(&map, 1);
+ *             break;
+ *         }
+ *     }
+ * }
  */
 
-/* FNV-1 functions */
-
+/*
+ * Ready-to-use hash functions for strings, using the FNV-1 algorithm (see
+ * http://www.isthe.com/chongo/tech/comp/fnv).
+ * `strhash` and `strihash` take 0-terminated strings, while `memhash` and
+ * `memihash` operate on arbitrary-length memory.
+ * `strihash` and `memihash` are case insensitive versions.
+ * `memihash_cont` is a variant of `memihash` that allows a computation to be
+ * continued with another chunk of data.
+ */
 extern unsigned int strhash(const char *buf);
 extern unsigned int strihash(const char *buf);
 extern unsigned int memhash(const void *buf, size_t len);
 extern unsigned int memihash(const void *buf, size_t len);
 extern unsigned int memihash_cont(unsigned int hash_seed, const void *buf, size_t len);
 
+/*
+ * Converts a cryptographic hash (e.g. SHA-1) into an int-sized hash code
+ * for use in hash tables. Cryptographic hashes are supposed to have
+ * uniform distribution, so in contrast to `memhash()`, this just copies
+ * the first `sizeof(int)` bytes without shuffling any bits. Note that
+ * the results will be different on big-endian and little-endian
+ * platforms, so they should not be stored or transferred over the net.
+ */
 static inline unsigned int sha1hash(const unsigned char *sha1)
 {
 	/*
@@ -25,90 +131,255 @@ static inline unsigned int sha1hash(const unsigned char *sha1)
 	return hash;
 }
 
-/* data structures */
-
+/*
+ * struct hashmap_entry is an opaque structure representing an entry in the
+ * hash table, which must be used as first member of user data structures.
+ * Ideally it should be followed by an int-sized member to prevent unused
+ * memory on 64-bit systems due to alignment.
+ */
 struct hashmap_entry {
+	/*
+	 * next points to the next entry in case of collisions (i.e. if
+	 * multiple entries map to the same bucket)
+	 */
 	struct hashmap_entry *next;
+
+	/* entry's hash code */
 	unsigned int hash;
 };
 
+/*
+ * User-supplied function to test two hashmap entries for equality. Shall
+ * return 0 if the entries are equal.
+ *
+ * This function is always called with non-NULL `entry` and `entry_or_key`
+ * parameters that have the same hash code.
+ *
+ * When looking up an entry, the `key` and `keydata` parameters to hashmap_get
+ * and hashmap_remove are always passed as second `entry_or_key` and third
+ * argument `keydata`, respectively. Otherwise, `keydata` is NULL.
+ *
+ * When it is too expensive to allocate a user entry (either because it is
+ * large or varialbe sized, such that it is not on the stack), then the
+ * relevant data to check for equality should be passed via `keydata`.
+ * In this case `key` can be a stripped down version of the user key data
+ * or even just a hashmap_entry having the correct hash.
+ *
+ * The `hashmap_cmp_fn_data` entry is the pointer given in the init function.
+ */
 typedef int (*hashmap_cmp_fn)(const void *hashmap_cmp_fn_data,
 			      const void *entry, const void *entry_or_key,
 			      const void *keydata);
 
+/*
+ * struct hashmap is the hash table structure. Members can be used as follows,
+ * but should not be modified directly.
+ */
 struct hashmap {
 	struct hashmap_entry **table;
+
+	/* Stores the comparison function specified in `hashmap_init()`. */
 	hashmap_cmp_fn cmpfn;
 	const void *cmpfn_data;
-	unsigned int size, tablesize, grow_at, shrink_at;
-	unsigned disallow_rehash : 1;
-};
 
-struct hashmap_iter {
-	struct hashmap *map;
-	struct hashmap_entry *next;
-	unsigned int tablepos;
+	/* total number of entries (0 means the hashmap is empty) */
+	unsigned int size;
+
+	/*
+	 * tablesize is the allocated size of the hash table. A non-0 value
+	 * indicates that the hashmap is initialized. It may also be useful
+	 * for statistical purposes (i.e. `size / tablesize` is the current
+	 * load factor).
+	 */
+	unsigned int tablesize;
+
+	unsigned int grow_at;
+	unsigned int shrink_at;
+
+	/* See `hashmap_disallow_rehash`. */
+	unsigned disallow_rehash : 1;
 };
 
 /* hashmap functions */
 
+/*
+ * Initializes a hashmap structure.
+ *
+ * `map` is the hashmap to initialize.
+ *
+ * The `equals_function` can be specified to compare two entries for equality.
+ * If NULL, entries are considered equal if their hash codes are equal.
+ *
+ * The `equals_function_data` parameter can be used to provide additional data
+ * (a callback cookie) that will be passed to `equals_function` each time it
+ * is called. This allows a single `equals_function` to implement multiple
+ * comparison functions.
+ *
+ * If the total number of entries is known in advance, the `initial_size`
+ * parameter may be used to preallocate a sufficiently large table and thus
+ * prevent expensive resizing. If 0, the table is dynamically resized.
+ */
 extern void hashmap_init(struct hashmap *map,
 			 hashmap_cmp_fn equals_function,
 			 const void *equals_function_data,
 			 size_t initial_size);
+
+/*
+ * Frees a hashmap structure and allocated memory.
+ *
+ * If `free_entries` is true, each hashmap_entry in the map is freed as well
+ * using stdlibs free().
+ */
 extern void hashmap_free(struct hashmap *map, int free_entries);
 
 /* hashmap_entry functions */
 
+/*
+ * Initializes a hashmap_entry structure.
+ *
+ * `entry` points to the entry to initialize.
+ * `hash` is the hash code of the entry.
+ *
+ * The hashmap_entry structure does not hold references to external resources,
+ * and it is safe to just discard it once you are done with it (i.e. if
+ * your structure was allocated with xmalloc(), you can just free(3) it,
+ * and if it is on stack, you can just let it go out of scope).
+ */
 static inline void hashmap_entry_init(void *entry, unsigned int hash)
 {
 	struct hashmap_entry *e = entry;
 	e->hash = hash;
 	e->next = NULL;
 }
+
+/*
+ * Returns the hashmap entry for the specified key, or NULL if not found.
+ *
+ * `map` is the hashmap structure.
+ *
+ * `key` is a user data structure that starts with hashmap_entry that has at
+ * least been initialized with the proper hash code (via `hashmap_entry_init`).
+ *
+ * `keydata` is a data structure that holds just enough information to check
+ * for equality to a given entry.
+ *
+ * If the key data is variable-sized (e.g. a FLEX_ARRAY string) or quite large,
+ * it is undesirable to create a full-fledged entry structure on the heap and
+ * copy all the key data into the structure.
+ *
+ * In this case, the `keydata` parameter can be used to pass
+ * variable-sized key data directly to the comparison function, and the `key`
+ * parameter can be a stripped-down, fixed size entry structure allocated on the
+ * stack.
+ *
+ * If an entry with matching hash code is found, `key` and `keydata` are passed
+ * to `hashmap_cmp_fn` to decide whether the entry matches the key.
+ */
 extern void *hashmap_get(const struct hashmap *map, const void *key,
-		const void *keydata);
-extern void *hashmap_get_next(const struct hashmap *map, const void *entry);
-extern void hashmap_add(struct hashmap *map, void *entry);
-extern void *hashmap_put(struct hashmap *map, void *entry);
-extern void *hashmap_remove(struct hashmap *map, const void *key,
-		const void *keydata);
+			 const void *keydata);
 
+/*
+ * Returns the hashmap entry for the specified hash code and key data,
+ * or NULL if not found.
+ *
+ * `map` is the hashmap structure.
+ * `hash` is the hash code of the entry to look up.
+ *
+ * If an entry with matching hash code is found, `keydata` is passed to
+ * `hashmap_cmp_fn` to decide whether the entry matches the key. The
+ * `entry_or_key` parameter of `hashmap_cmp_fn` points to a hashmap_entry
+ * structure that should not be used in the comparison.
+ */
 static inline void *hashmap_get_from_hash(const struct hashmap *map,
-		unsigned int hash, const void *keydata)
+					  unsigned int hash,
+					  const void *keydata)
 {
 	struct hashmap_entry key;
 	hashmap_entry_init(&key, hash);
 	return hashmap_get(map, &key, keydata);
 }
 
+/*
+ * Returns the next equal hashmap entry, or NULL if not found. This can be
+ * used to iterate over duplicate entries (see `hashmap_add`).
+ *
+ * `map` is the hashmap structure.
+ * `entry` is the hashmap_entry to start the search from, obtained via a previous
+ * call to `hashmap_get` or `hashmap_get_next`.
+ */
+extern void *hashmap_get_next(const struct hashmap *map, const void *entry);
+
+/*
+ * Adds a hashmap entry. This allows to add duplicate entries (i.e.
+ * separate values with the same key according to hashmap_cmp_fn).
+ *
+ * `map` is the hashmap structure.
+ * `entry` is the entry to add.
+ */
+extern void hashmap_add(struct hashmap *map, void *entry);
+
+/*
+ * Adds or replaces a hashmap entry. If the hashmap contains duplicate
+ * entries equal to the specified entry, only one of them will be replaced.
+ *
+ * `map` is the hashmap structure.
+ * `entry` is the entry to add or replace.
+ * Returns the replaced entry, or NULL if not found (i.e. the entry was added).
+ */
+extern void *hashmap_put(struct hashmap *map, void *entry);
+
+/*
+ * Removes a hashmap entry matching the specified key. If the hashmap contains
+ * duplicate entries equal to the specified key, only one of them will be
+ * removed. Returns the removed entry, or NULL if not found.
+ *
+ * Argument explanation is the same as in `hashmap_get`.
+ */
+extern void *hashmap_remove(struct hashmap *map, const void *key,
+		const void *keydata);
+
+/*
+ * Returns the `bucket` an entry is stored in.
+ * Useful for multithreaded read access.
+ */
 int hashmap_bucket(const struct hashmap *map, unsigned int hash);
 
 /*
  * Disallow/allow rehashing of the hashmap.
- * This is useful if the caller knows that the hashmap
- * needs multi-threaded access.  The caller is still
- * required to guard/lock searches and inserts in a
- * manner appropriate to their usage.  This simply
- * prevents the table from being unexpectedly re-mapped.
+ * This is useful if the caller knows that the hashmap needs multi-threaded
+ * access.  The caller is still required to guard/lock searches and inserts
+ * in a manner appropriate to their usage.  This simply prevents the table
+ * from being unexpectedly re-mapped.
  *
- * If is up to the caller to ensure that the hashmap is
- * initialized to a reasonable size to prevent poor
- * performance.
+ * It is up to the caller to ensure that the hashmap is initialized to a
+ * reasonable size to prevent poor performance.
  *
- * When value=1, prevent future rehashes on adds and deleted.
- * When value=0, allow future rehahses.  This DOES NOT force
- * a rehash now.
+ * A call to allow rehashing does not force a rehash; that might happen
+ * with the next insert or delete.
  */
 static inline void hashmap_disallow_rehash(struct hashmap *map, unsigned value)
 {
 	map->disallow_rehash = value;
 }
 
-/* hashmap_iter functions */
+/*
+ * Used to iterate over all entries of a hashmap. Note that it is
+ * not safe to add or remove entries to the hashmap while
+ * iterating.
+ */
+struct hashmap_iter {
+	struct hashmap *map;
+	struct hashmap_entry *next;
+	unsigned int tablepos;
+};
 
+/* Initializes a `hashmap_iter` structure. */
 extern void hashmap_iter_init(struct hashmap *map, struct hashmap_iter *iter);
+
+/* Returns the next hashmap_entry, or NULL if there are no more entries. */
 extern void *hashmap_iter_next(struct hashmap_iter *iter);
+
+/* Initializes the iterator and returns the first entry, if any. */
 static inline void *hashmap_iter_first(struct hashmap *map,
 		struct hashmap_iter *iter)
 {
@@ -116,8 +387,21 @@ static inline void *hashmap_iter_first(struct hashmap *map,
 	return hashmap_iter_next(iter);
 }
 
-/* string interning */
+/* String interning */
 
+/*
+ * Returns the unique, interned version of the specified string or data,
+ * similar to the `String.intern` API in Java and .NET, respectively.
+ * Interned strings remain valid for the entire lifetime of the process.
+ *
+ * Can be used as `[x]strdup()` or `xmemdupz` replacement, except that interned
+ * strings / data must not be modified or freed.
+ *
+ * Interned strings are best used for short strings with high probability of
+ * duplicates.
+ *
+ * Uses a hashmap to store the pool of interned strings.
+ */
 extern const void *memintern(const void *data, size_t len);
 static inline const char *strintern(const char *string)
 {
-- 
2.13.0.31.g9b732c453e


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCHv3 2/3] patch-ids.c: use hashmap correctly
  2017-06-30 19:14     ` [PATCHv3 2/3] patch-ids.c: use hashmap correctly Stefan Beller
@ 2017-06-30 19:50       ` Junio C Hamano
  2017-07-05  9:00       ` Jeff King
  1 sibling, 0 replies; 26+ messages in thread
From: Junio C Hamano @ 2017-06-30 19:50 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, git, jrnieder

Stefan Beller <sbeller@google.com> writes:

> As eluded to in the previous patch, the code in patch-ids.c is using
> the hashmaps API wrong.
>
> Luckily we do not have a bug, as all hashmap functionality that we use
> here (hashmap_get) passes through the keydata.  If hashmap_get_next were
> to be used, a bug would occur as that passes NULL for the key_data.
>
> So instead use the hashmap API correctly and provide the caller required
> data in the compare function via the first argument that always gets
> passed and was setup via the hashmap_init function.
>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>  patch-ids.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)

Nicely explained.  Thanks for polishing the series.

>
> diff --git a/patch-ids.c b/patch-ids.c
> index 815c115811..b4166b0f38 100644
> --- a/patch-ids.c
> +++ b/patch-ids.c
> @@ -35,10 +35,10 @@ int commit_patch_id(struct commit *commit, struct diff_options *options,
>   * the side of safety.  The actual value being negative does not have
>   * any significance; only that it is non-zero matters.
>   */
> -static int patch_id_cmp(const void *unused_cmp_data,
> +static int patch_id_cmp(struct diff_options *opt,
>  			struct patch_id *a,
>  			struct patch_id *b,
> -			struct diff_options *opt)
> +			const void *unused_keydata)
>  {
>  	if (is_null_oid(&a->patch_id) &&
>  	    commit_patch_id(a->commit, opt, &a->patch_id, 0))
> @@ -59,7 +59,7 @@ int init_patch_ids(struct patch_ids *ids)
>  	DIFF_OPT_SET(&ids->diffopts, RECURSIVE);
>  	diff_setup_done(&ids->diffopts);
>  	hashmap_init(&ids->patches, (hashmap_cmp_fn)patch_id_cmp,
> -		     NULL, 256);
> +		     &ids->diffopts, 256);
>  	return 0;
>  }
>  
> @@ -95,7 +95,7 @@ struct patch_id *has_commit_patch_id(struct commit *commit,
>  	if (init_patch_id_entry(&patch, commit, ids))
>  		return NULL;
>  
> -	return hashmap_get(&ids->patches, &patch, &ids->diffopts);
> +	return hashmap_get(&ids->patches, &patch, NULL);
>  }
>  
>  struct patch_id *add_commit_patch_id(struct commit *commit,

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHv3 2/3] patch-ids.c: use hashmap correctly
  2017-06-30 19:14     ` [PATCHv3 2/3] patch-ids.c: use hashmap correctly Stefan Beller
  2017-06-30 19:50       ` Junio C Hamano
@ 2017-07-05  9:00       ` Jeff King
  1 sibling, 0 replies; 26+ messages in thread
From: Jeff King @ 2017-07-05  9:00 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, git, gitster, jrnieder

On Fri, Jun 30, 2017 at 12:14:06PM -0700, Stefan Beller wrote:

> As eluded to in the previous patch, the code in patch-ids.c is using
> the hashmaps API wrong.
> 
> Luckily we do not have a bug, as all hashmap functionality that we use
> here (hashmap_get) passes through the keydata.  If hashmap_get_next were
> to be used, a bug would occur as that passes NULL for the key_data.
> 
> So instead use the hashmap API correctly and provide the caller required
> data in the compare function via the first argument that always gets
> passed and was setup via the hashmap_init function.

Reviewing this a bit late, but it looks good to me. And I think the
explanation above nicely covers what is going on (and why it isn't a
bug).

-Peff

PS I think you meant "alluded" in the first sentence, unless you really
   were trying to escape the previous patch. :)

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2017-07-05  9:00 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-29  1:13 [PATCH 0/2] Introduce data field in hashmap and migrate docs to header Stefan Beller
2017-06-29  1:13 ` [PATCH 1/2] hashmap.h: compare function has access to a data field Stefan Beller
2017-06-29 18:06   ` Junio C Hamano
2017-06-29 18:11     ` Junio C Hamano
2017-06-29 18:20       ` Stefan Beller
2017-06-30 17:26         ` Junio C Hamano
2017-06-29  1:13 ` [PATCH 2/2] hashmap: migrate documentation from Documentation/technical into header Stefan Beller
2017-06-29 13:25   ` Jeff Hostetler
2017-06-29 21:18   ` Jonathan Nieder
2017-06-29 23:41     ` Stefan Beller
2017-06-29 23:53 ` [PATCHv2 0/2] Introduce data field in hashmap and migrate docs to header Stefan Beller
2017-06-29 23:53   ` [PATCHv2 1/2] hashmap.h: compare function has access to a data field Stefan Beller
2017-06-30 17:34     ` Junio C Hamano
2017-06-30 17:41       ` Stefan Beller
2017-06-30 18:47         ` Junio C Hamano
2017-06-30 18:57           ` Stefan Beller
2017-06-30 17:39     ` Junio C Hamano
2017-06-30 18:00       ` Stefan Beller
2017-06-30 18:40         ` Junio C Hamano
2017-06-29 23:53   ` [PATCHv2 2/2] hashmap: migrate documentation from Documentation/technical into header Stefan Beller
2017-06-30 19:14   ` [PATCHv3 0/3] Introduce data field in hashmap and migrate docs to header Stefan Beller
2017-06-30 19:14     ` [PATCHv3 1/3] hashmap.h: compare function has access to a data field Stefan Beller
2017-06-30 19:14     ` [PATCHv3 2/3] patch-ids.c: use hashmap correctly Stefan Beller
2017-06-30 19:50       ` Junio C Hamano
2017-07-05  9:00       ` Jeff King
2017-06-30 19:14     ` [PATCHv3 3/3] hashmap: migrate documentation from Documentation/technical into header Stefan Beller

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).