bug-gnulib@gnu.org mirror (unofficial)
 help / color / mirror / Atom feed
* [PATCH] hamt: New module.
@ 2021-04-03  9:25 Marc Nieper-Wißkirchen
  2021-04-03 17:02 ` Bruno Haible
  2021-04-05 13:02 ` Bruno Haible
  0 siblings, 2 replies; 9+ messages in thread
From: Marc Nieper-Wißkirchen @ 2021-04-03  9:25 UTC (permalink / raw)
  To: bug-gnulib; +Cc: Marc Nieper-Wißkirchen

From: Marc Nieper-Wißkirchen <marc@nieper-wisskirchen.de>

This module provides (persistent) hash array mapped tries.
* MODULES.html.sh: Add hamt.
* lib/hamt.c: New file.
* lib/hamt.h: New file.
* modules/hamt: New file.
* modules/hamt-tests: New file.
* tests/test-hamt.c: New file.
---
 ChangeLog          |   11 +
 MODULES.html.sh    |    1 +
 lib/hamt.c         | 1083 ++++++++++++++++++++++++++++++++++++++++++++
 lib/hamt.h         |  253 +++++++++++
 modules/hamt       |   29 ++
 modules/hamt-tests |   11 +
 tests/test-hamt.c  |  378 ++++++++++++++++
 7 files changed, 1766 insertions(+)
 create mode 100644 lib/hamt.c
 create mode 100644 lib/hamt.h
 create mode 100644 modules/hamt
 create mode 100644 modules/hamt-tests
 create mode 100644 tests/test-hamt.c

diff --git a/ChangeLog b/ChangeLog
index 6224aa7cb..a8b9aec60 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,14 @@
+2020-04-03  Marc Nieper-Wißkirchen  <marc@nieper-wisskirchen.de>
+
+	hamt: New module.
+	This module provides (persistent) hash array mapped tries.
+	* MODULES.html.sh: Add hamt.
+	* lib/hamt.c: New file.
+	* lib/hamt.h: New file.
+	* modules/hamt: New file.
+	* modules/hamt-tests: New file.
+	* tests/test-hamt.c: New file.
+
 2021-04-02  Bruno Haible  <bruno@clisp.org>
 
 	strtoul, strtoll, strtoull: Fix compilation warning.
diff --git a/MODULES.html.sh b/MODULES.html.sh
index dad6c2a77..646f30b72 100755
--- a/MODULES.html.sh
+++ b/MODULES.html.sh
@@ -2027,6 +2027,7 @@ func_all_modules ()
   func_module hash-pjw
   func_module hash-pjw-bare
   func_module hash
+  func_module hamt
   func_module readline
   func_module readtokens
   func_module readtokens0
diff --git a/lib/hamt.c b/lib/hamt.c
new file mode 100644
index 000000000..204c2f069
--- /dev/null
+++ b/lib/hamt.c
@@ -0,0 +1,1083 @@
+/* (Persistent) hash array mapped tries.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+
+   This program is free software: you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <https://www.gnu.org/licenses/>.  */
+
+/* Written by Marc Nieper-Wißkirchen <marc@nieper-wisskirchen.de>, 2021.  */
+
+#include <config.h>
+#define _GL_HAMT_INLINE _GL_EXTERN_INLINE
+#include "hamt.h"
+
+#include <flexmember.h>
+#include <inttypes.h>
+#include <stdlib.h>
+#include "count-one-bits.h"
+#include "verify.h"
+#include "xalloc.h"
+
+/* Reference counters can be shared between different threads if the
+   entry they belong to is shared between different threads.
+   Operations on them therefore have to be atomic so that no invalid
+   state is observable.
+
+   A thread must not modify an entry or its children (!) if its
+   reference count implies that the entry is shared by at least two
+   hamts.  */
+typedef
+#if GL_HAMT_THREAD_SAFE
+_Atomic
+#endif
+size_t ref_counter;
+
+/***************/
+/* Entry Types */
+/***************/
+
+/* Leaf nodes are of type element.  Non-leaf nodes are either subtries
+   or, if at maximal depth, buckets.  The entry type is stored in the
+   lower two bits of the reference counter, whereas reference counters
+   for entries are incremented and decremented in multiples of 4.  */
+enum entry_type
+{
+  element_entry = 0,
+  subtrie_entry = 1,
+  bucket_entry = 2
+};
+
+/* Return the type an entry.  */
+static enum entry_type
+entry_type (const Hamt_entry *entry)
+{
+  return entry->ref_count & 3;
+}
+
+/********************/
+/* Reference Counts */
+/********************/
+
+/* Initialize the reference counter, storing its type.  */
+static void
+init_ref_counter (ref_counter *counter, enum entry_type type)
+{
+  *counter = 4 + type;
+}
+
+/* Increase the reference counter of an entry.  */
+static void
+inc_ref_counter (ref_counter *counter)
+{
+  *counter += 4;
+}
+
+/* Decrease the entry reference counter.  Return false if the entry
+   can be deleted.  */
+static bool
+dec_ref_counter (ref_counter *counter)
+{
+  *counter -= 4;
+  return *counter >= 4;
+}
+
+/**************/
+/* Structures */
+/**************/
+
+/* Different generations of a hamt share a function table.  */
+struct function_table
+{
+  Hamt_hasher *hasher;
+  Hamt_comparator *comparator;
+  Hamt_freer *freer;
+  ref_counter ref_count;
+};
+
+/* Different generations of a hamt share subtries.  A singleton
+   subtrie is modelled as a single element.  */
+struct subtrie
+{
+  ref_counter ref_count;
+  /* Nodes carry labels from 0 to 31.  The i-th bit in MAP is set if
+     the node labelled i is present.  */
+  uint32_t map;
+  /* The length of the NODES array is the population count of MAP.
+     The order of the nodes corresponds to the order of the 1-bits in
+     MAP.  */
+  Hamt_entry *nodes[FLEXIBLE_ARRAY_MEMBER];
+};
+
+/* Buckets are used when different elements have the same hash values.  */
+struct bucket
+{
+  ref_counter ref_counter;
+  size_t elt_count;
+  Hamt_entry *elts[FLEXIBLE_ARRAY_MEMBER];
+};
+
+/* A hamt consists of its function table and the root entry.  */
+struct hamt
+{
+  struct function_table *functions;
+  /* The root entry is NULL for an empty HAMT.  */
+  Hamt_entry *root;
+};
+
+/*******************/
+/* Function Tables */
+/*******************/
+
+/* Allocate and initialize a function table.  */
+static struct function_table *
+create_function_table (Hamt_hasher *hasher, Hamt_comparator *comparator,
+                       Hamt_freer *freer)
+{
+  struct function_table *functions = XMALLOC (struct function_table);
+  functions->hasher = hasher;
+  functions->comparator = comparator;
+  functions->freer = freer;
+  functions->ref_count = 1;
+  return functions;
+}
+
+/* Increment the reference count and return the function table. */
+static struct function_table *
+copy_function_table (struct function_table *function_table)
+{
+  ++function_table->ref_count;
+  return function_table;
+}
+
+/* Decrease the reference count and free the function table if the
+   reference count drops to zero.  */
+static void
+free_function_table (struct function_table *function_table)
+{
+  if (--function_table->ref_count)
+    return;
+  free (function_table);
+}
+
+/************/
+/* Elements */
+/************/
+
+/* Return an element's hash.  */
+static size_t
+hash_element (const struct function_table *functions, const Hamt_entry *elt)
+{
+  return functions->hasher (elt);
+}
+
+/* Compare two elements.  */
+static bool
+compare_elements (const struct function_table *functions,
+                  const Hamt_entry *elt1, const Hamt_entry *elt2)
+{
+  return functions->comparator (elt1, elt2);
+}
+
+/* Free an element.  */
+static void
+free_element (const struct function_table *functions, Hamt_entry *elt)
+{
+  if (dec_ref_counter (&elt->ref_count))
+    return;
+  functions->freer (elt);
+}
+
+/* Return the initialized element.  */
+static Hamt_entry *
+init_element (Hamt_entry *elt)
+{
+  init_ref_counter (&elt->ref_count, element_entry);
+  return elt;
+}
+
+/***********/
+/* Buckets */
+/***********/
+
+/* Allocate a partially initialized bucket with a given number of elements.  */
+static struct bucket *
+alloc_bucket (size_t elt_count)
+{
+  struct bucket *bucket
+    = xmalloc (FLEXSIZEOF (struct bucket, elts,
+                           sizeof (Hamt_entry) * elt_count));
+  init_ref_counter (&bucket->ref_counter, bucket_entry);
+  bucket->elt_count = elt_count;
+  return bucket;
+}
+
+/***********/
+/* Entries */
+/***********/
+
+/* Return true if the entry is shared between two or more hamts.
+   Otherwise, return false.
+
+   This procedure is used for destructive updates.  If an entry and
+   all its parents are not shared, it can be updated destructively
+   without effecting other hamts.  */
+static bool
+is_shared (const Hamt_entry *entry)
+{
+  return entry->ref_count >= 8;
+}
+
+/* Calculate and return the number of nodes in a subtrie.  */
+static int
+trienode_count (const struct subtrie *subtrie)
+{
+  return count_one_bits (subtrie->map); /* In Gnulib, we assume that
+                                           an integer has at least 32
+                                           bits. */
+}
+
+/* Allocate a partially initialized subtrie with a given number of nodes.  */
+static struct subtrie *
+alloc_subtrie (int node_count)
+{
+  struct subtrie *subtrie
+    = xmalloc (FLEXSIZEOF (struct subtrie, nodes,
+                           sizeof (Hamt_entry) * node_count));
+  init_ref_counter (&subtrie->ref_count, subtrie_entry);
+  return subtrie;
+}
+
+/* Return a conceptually copy of an entry.  */
+static Hamt_entry *
+copy_entry (Hamt_entry *entry)
+{
+  inc_ref_counter (&entry->ref_count);
+  return entry;
+}
+
+/* Return a new bucket that has the j-th element replaced.  */
+static struct bucket *
+replace_bucket_element (struct bucket *bucket, int j, Hamt_entry *elt)
+{
+  int n = bucket->elt_count;
+  struct bucket *new_bucket = alloc_bucket (n);
+  for (int k = 0; k < n; ++k)
+    if (k == j)
+      new_bucket->elts[k] = elt;
+    else
+      new_bucket->elts[k] = copy_entry (bucket->elts[k]);
+  return new_bucket;
+}
+
+/* Return a new subtrie that has the j-th node replaced.  */
+static struct subtrie *
+replace_entry (struct subtrie *subtrie, int j, Hamt_entry *entry)
+{
+  int n = trienode_count (subtrie);
+  struct subtrie *new_subtrie = alloc_subtrie (n);
+  new_subtrie->map = subtrie->map;
+  for (int k = 0; k < n; ++k)
+    if (k == j)
+      new_subtrie->nodes[k] = entry;
+    else
+      new_subtrie->nodes[k] = copy_entry (subtrie->nodes[k]);
+  return new_subtrie;
+}
+
+/* Return a new subtrie that has an entry labelled i inserted at
+   the j-th position.  */
+static struct subtrie *
+insert_entry (struct subtrie *subtrie, int i, int j, Hamt_entry *entry)
+{
+  int n = trienode_count (subtrie) + 1;
+  struct subtrie *new_subtrie = alloc_subtrie (n);
+  new_subtrie->map = subtrie->map | (1 << i);
+  for (int k = 0; k < n; ++k)
+    {
+      if (k < j)
+        new_subtrie->nodes[k] = copy_entry (subtrie->nodes[k]);
+      else if (k > j)
+        new_subtrie->nodes[k] = copy_entry (subtrie->nodes[k - 1]);
+      else
+        new_subtrie->nodes[k] = entry;
+    }
+  return new_subtrie;
+}
+
+/* Return a new entry that has the entry labelled i removed from
+   position j.  */
+static Hamt_entry *
+remove_subtrie_entry (struct subtrie *subtrie, int i, int j)
+{
+  int n = trienode_count (subtrie) - 1;
+  if (n == 1)
+    {
+      if (j == 0)
+        return copy_entry (subtrie->nodes[1]);
+      return copy_entry (subtrie->nodes[0]);
+    }
+  struct subtrie *new_subtrie = alloc_subtrie (n);
+  new_subtrie->map = subtrie->map & ~(1 << i);
+  for (int k = 0; k < n; ++k)
+    {
+      if (k < j)
+        new_subtrie->nodes[k] = copy_entry (subtrie->nodes[k]);
+      else if (k >= j)
+        new_subtrie->nodes[k] = copy_entry (subtrie->nodes[k + 1]);
+    }
+  return (Hamt_entry *) new_subtrie;
+}
+
+/* Return a new entry that has the entry at position j removed.  */
+static Hamt_entry *
+remove_bucket_entry (struct bucket *bucket, int j)
+{
+  int n = bucket->elt_count - 1;
+  if (n == 1)
+    {
+      if (j == 0)
+        return copy_entry (bucket->elts[1]);
+      return copy_entry (bucket->elts[0]);
+    }
+  struct bucket *new_bucket = alloc_bucket (n);
+  for (int k = 0; k < n; ++k)
+    {
+      if (k < j)
+        new_bucket->elts[k] = copy_entry (bucket->elts[k]);
+      else if (k >= j)
+        new_bucket->elts[k] = copy_entry (bucket->elts[k + 1]);
+    }
+  return (Hamt_entry *) new_bucket;
+}
+
+/****************************/
+/* Creation and Destruction */
+/****************************/
+
+/* Create a new, empty hash array mapped trie.  */
+Hamt *
+hamt_create (Hamt_hasher *hasher, Hamt_comparator *comparator,
+             Hamt_freer *freer)
+{
+  struct function_table *functions
+    = create_function_table (hasher, comparator, freer);
+  Hamt *hamt = XMALLOC (Hamt);
+  hamt->functions = functions;
+  hamt->root = NULL;
+  return hamt;
+}
+
+/* Return a copy of the hamt.  */
+Hamt *
+hamt_copy (Hamt *hamt)
+{
+  Hamt *new_hamt = XMALLOC (Hamt);
+  new_hamt->functions = copy_function_table (hamt->functions);
+  new_hamt->root = hamt->root == NULL ? NULL : copy_entry (hamt->root);
+  return new_hamt;
+}
+
+/* Free a bucket.  */
+static void
+free_bucket (struct function_table const *functions, struct bucket *bucket)
+{
+  if (dec_ref_counter (&bucket->ref_counter))
+    return;
+  size_t elt_count = bucket->elt_count;
+  Hamt_entry *const *elts = bucket->elts;
+  for (size_t i = 0; i < elt_count; ++i)
+    free_element (functions, elts[i]);
+  free (bucket);
+}
+
+/* Forward declaration.  */
+static void free_subtrie (struct function_table const *functions,
+                          struct subtrie *subtrie);
+
+/* Free an entry.  */
+static void
+free_entry (struct function_table const *functions, Hamt_entry *entry)
+{
+  switch (entry_type (entry))
+    {
+    case element_entry:
+      free_element (functions, entry);
+      break;
+    case subtrie_entry:
+      free_subtrie (functions, (struct subtrie *) entry);
+      break;
+    case bucket_entry:
+      free_bucket (functions, (struct bucket *) entry);
+      break;
+    default:
+      assume (0);
+    }
+}
+
+/* Free a trie recursively.  */
+static void
+free_subtrie (struct function_table const *functions, struct subtrie *subtrie)
+{
+  if (dec_ref_counter (&subtrie->ref_count))
+    return;
+  int n = trienode_count (subtrie);
+  Hamt_entry **node_ptr = subtrie->nodes;
+  for (int j = 0; j < n; ++j)
+    free_entry (functions, *node_ptr++);
+  free (subtrie);
+}
+
+/* Free a hamt.  */
+void
+hamt_free (Hamt *hamt)
+{
+  if (hamt->root != NULL)
+    free_entry (hamt->functions, hamt->root);
+  free_function_table (hamt->functions);
+  free (hamt);
+}
+
+/**********/
+/* Lookup */
+/**********/
+
+/* Lookup an element in a bucket.  */
+static Hamt_entry *
+bucket_lookup (const struct function_table *functions,
+               const struct bucket *bucket, const void *elt)
+{
+  size_t elt_count = bucket->elt_count;
+  Hamt_entry *const *elts = bucket->elts;
+  for (size_t i = 0; i < elt_count; ++i)
+    {
+      if (compare_elements (functions, elt, elts[i]))
+        return *elts;
+    }
+  return NULL;
+}
+
+/* Forward declaration.  */
+static Hamt_entry *entry_lookup (const struct function_table *functions,
+                                 Hamt_entry *entry,
+                                 const void *elt, size_t hash);
+
+/* Lookup an element in a bucket.  */
+static Hamt_entry *
+subtrie_lookup (const struct function_table *functions,
+                const struct subtrie *subtrie, const void *elt,
+                size_t hash)
+{
+  uint32_t map = subtrie->map;
+  int i = hash & 31;
+
+  if (! (map & (1 << i)))
+    return NULL;
+
+  int j = i == 0 ? 0 : count_one_bits (map << (32 - i));
+  return entry_lookup (functions, subtrie->nodes[j], elt, hash >> 5);
+}
+
+/* Lookup an element in an entry.  */
+static Hamt_entry *
+entry_lookup (const struct function_table *functions, Hamt_entry *entry,
+              const void *elt, size_t hash)
+{
+  switch (entry_type (entry))
+    {
+    case element_entry:
+      if (compare_elements (functions, elt, entry))
+        return entry;
+      return NULL;
+    case subtrie_entry:
+      return subtrie_lookup (functions, (struct subtrie *) entry, elt, hash);
+    case bucket_entry:
+      return bucket_lookup (functions, (struct bucket *) entry, elt);
+    default:
+      assume (0);
+    }
+}
+
+/* If ELT matches an entry in HAMT, return this entry.  Otherwise,
+   return NULL.  */
+Hamt_entry *
+hamt_lookup (const Hamt *hamt, const void *elt)
+{
+  if (hamt->root == NULL)
+    return NULL;
+
+  return entry_lookup (hamt->functions, hamt->root, elt,
+                       hash_element (hamt->functions, elt));
+}
+
+/**************************/
+/* Insertion and Deletion */
+/**************************/
+
+/* Create a bucket populated with two elements.  */
+static struct bucket *
+create_populated_bucket (Hamt_entry *elt1, Hamt_entry *elt2)
+{
+  struct bucket *bucket = alloc_bucket (2);
+  bucket->elts[0] = elt1;
+  bucket->elts[1] = elt2;
+  return bucket;
+}
+
+/* Create a chain of subtrie nodes so that the resulting trie is
+   populated with exactly two elements.  */
+static Hamt_entry *
+create_populated_subtrie (Hamt_entry *elt1, Hamt_entry *elt2, size_t hash1,
+                          size_t hash2, int depth)
+{
+  if (depth >= _GL_HAMT_MAX_DEPTH)
+    return (Hamt_entry *) create_populated_bucket (elt1, elt2);
+
+  struct subtrie *subtrie;
+  int i1 = hash1 & 31;
+  int i2 = hash2 & 31;
+  if (i1 != i2)
+    {
+      subtrie = alloc_subtrie (2);
+      subtrie->map = (1 << i1) | (1 << i2);
+      if (i1 < i2)
+        {
+          subtrie->nodes[0] = elt1;
+          subtrie->nodes[1] = elt2;
+        }
+      else
+        {
+          subtrie->nodes[0] = elt2;
+          subtrie->nodes[1] = elt1;
+        }
+    }
+  else
+    {
+      subtrie = alloc_subtrie (1);
+      subtrie->map = 1 << i1;
+      subtrie->nodes[0]
+        = create_populated_subtrie (elt1, elt2, hash1 >> 5, hash2 >> 5,
+                                    depth + 1);
+    }
+  return (Hamt_entry *) subtrie;
+}
+
+/* Insert or replace an element in a bucket.  */
+static struct bucket *
+bucket_insert (const struct function_table *functions, struct bucket *bucket,
+               Hamt_entry **elt_ptr, bool replace, bool shared)
+{
+  size_t elt_count = bucket->elt_count;
+  Hamt_entry **elts = bucket->elts;
+  for (size_t i = 0; i < elt_count; ++i)
+    {
+      if (compare_elements (functions, *elt_ptr, elts[i]))
+        {
+          if (replace)
+            {
+              if (shared)
+                {
+                  struct bucket *new_bucket
+                    = replace_bucket_element (bucket, i,
+                                              copy_entry (*elt_ptr));
+                  *elt_ptr = elts[i];
+                  return new_bucket;
+                }
+              free_element (functions, elts[i]);
+              elts[i] = copy_entry (*elt_ptr);
+              return bucket;
+            }
+          *elt_ptr = *elt_ptr == elts[i] ? NULL : elts[i];
+          return bucket;
+        }
+    }
+  struct bucket *new_bucket = alloc_bucket (elt_count + 1);
+  new_bucket->elts[0] = copy_entry (*elt_ptr);
+  for (size_t i = 0; i < elt_count; ++i)
+    {
+      new_bucket->elts[i + 1] = copy_entry (bucket->elts[i]);
+    }
+  if (replace)
+    *elt_ptr = NULL;
+  return new_bucket;
+}
+
+/* Forward declaration.  */
+static Hamt_entry *entry_insert (const struct function_table *functions,
+                                 Hamt_entry *subtrie, Hamt_entry **elt_ptr,
+                                 size_t hash, int depth, bool replace,
+                                 bool shared);
+
+/* Insert or replace an element in a subtrie.  */
+static struct subtrie *
+subtrie_insert (const struct function_table *functions, struct subtrie *subtrie,
+                Hamt_entry **elt_ptr, size_t hash, int depth, bool replace,
+                bool shared)
+{
+  uint32_t map = subtrie->map;
+  int i = hash & 31;
+  int j = i == 0 ? 0 : count_one_bits (map << (32 - i));
+  if (map & (1 << i))
+    {
+      Hamt_entry *entry = subtrie->nodes[j];
+      Hamt_entry *new_entry
+        = entry_insert (functions, entry, elt_ptr, hash >> 5, depth + 1,
+                        replace, shared);
+      if (new_entry != entry)
+        {
+          if (shared)
+            return replace_entry (subtrie, j, new_entry);
+          free_entry (functions, entry);
+          subtrie->nodes[j] = new_entry;
+        }
+      return subtrie;
+    }
+  Hamt_entry *entry = copy_entry (*elt_ptr);
+  if (replace)
+    *elt_ptr = NULL;
+  return insert_entry (subtrie, i, j, entry);
+}
+
+/* Insert or replace an element in an entry.
+
+   REPLACE is true if we want replace instead of insert semantics.
+   SHARED is false if a destructive update has been requested and none
+   of the parent nodes are shared.  If an entry cannot be inserted
+   because the same entry with respect to pointer equality is already
+   present, *ELT_PTR is set to NULL to mark this special case.  */
+static Hamt_entry *
+entry_insert (const struct function_table *functions, Hamt_entry *entry,
+              Hamt_entry **elt_ptr, size_t hash, int depth, bool replace,
+              bool shared)
+{
+  shared |= is_shared (entry);
+  switch (entry_type (entry))
+    {
+    case element_entry:
+      if (compare_elements (functions, *elt_ptr, entry))
+        {
+          if (replace)
+            {
+              if (shared)
+                {
+                  Hamt_entry *new_entry = copy_entry (*elt_ptr);
+                  *elt_ptr = entry;
+                  return new_entry;
+                }
+              return copy_entry (*elt_ptr);
+            }
+          *elt_ptr = *elt_ptr == entry ? NULL : entry;
+          return entry;
+        }
+      Hamt_entry *new_entry = copy_entry (*elt_ptr);
+      if (replace)
+        *elt_ptr = NULL;
+      return create_populated_subtrie (new_entry, copy_entry (entry), hash,
+                                       (hash_element (functions, entry)
+                                        >> (5 * depth)), depth);
+    case subtrie_entry:
+      return (Hamt_entry *)
+        subtrie_insert (functions, (struct subtrie *) entry, elt_ptr, hash,
+                        depth, replace, shared);
+    case bucket_entry:
+      return (Hamt_entry *)
+        bucket_insert (functions, (struct bucket *) entry, elt_ptr, replace,
+                       shared);
+    default:
+      assume (0);
+    }
+}
+
+/* Insert or replace an element in the root.  */
+static Hamt_entry *
+root_insert (const struct function_table *functions, Hamt_entry *root,
+             Hamt_entry **elt_ptr, bool replace, bool shared)
+{
+  if (root == NULL)
+    return copy_entry (*elt_ptr);
+
+ return entry_insert (functions, root, elt_ptr,
+                      hash_element (functions, *elt_ptr), 0, replace, shared);
+}
+
+/* If *ELT_PTR matches an element already in HAMT, set *ELT_PTR to the
+   element from the table and return HAMT.  Otherwise, insert *ELT_PTR
+   into a copy of the HAMT and return the copy.  */
+Hamt *
+hamt_insert (Hamt *hamt, Hamt_entry **elt_ptr)
+{
+  Hamt_entry *elt = *elt_ptr;
+  Hamt_entry *new_entry = root_insert (hamt->functions, hamt->root,
+                                       elt_ptr, false, true);
+  if (*elt_ptr == NULL)
+    *elt_ptr = elt;
+
+  if (new_entry == hamt->root)
+    return hamt;
+
+  Hamt *new_hamt = XMALLOC (Hamt);
+  new_hamt->functions = copy_function_table (hamt->functions);
+  new_hamt->root = new_entry;
+  return new_hamt;
+}
+
+/* Insert *ELT_PTR into a copy of HAMT and return the copy.  If an
+   existing element was replaced, set *ELT_PTR to this element, and to
+   NULL otherwise. */
+Hamt *
+hamt_replace (Hamt *hamt, Hamt_entry **elt_ptr)
+{
+  Hamt *new_hamt = XMALLOC (Hamt);
+  new_hamt->functions = copy_function_table (hamt->functions);
+  new_hamt->root = root_insert (hamt->functions, hamt->root, elt_ptr, true,
+                                true);
+  return new_hamt;
+}
+
+/* Remove an element in a bucket if found.  */
+static Hamt_entry *
+bucket_remove (const struct function_table *functions, struct bucket *bucket,
+               Hamt_entry **elt_ptr)
+{
+  size_t elt_count = bucket->elt_count;
+  Hamt_entry *const *elts = bucket->elts;
+  for (size_t i = 0; i < elt_count; ++i)
+    {
+      if (compare_elements (functions, *elt_ptr, elts[i]))
+        {
+          *elt_ptr = elts[i];
+          return remove_bucket_entry (bucket, i);
+        }
+    }
+  *elt_ptr = NULL;
+  return (Hamt_entry *) bucket;
+}
+
+/* Forward declaration.  */
+static Hamt_entry *entry_remove (const struct function_table *functions,
+                                 Hamt_entry *entry, Hamt_entry **elt_ptr,
+                                 size_t hash, int depth, bool shared);
+
+/* Remove an element in a subtrie if found.  */
+static Hamt_entry *
+subtrie_remove (const struct function_table *functions, struct subtrie *subtrie,
+                Hamt_entry **elt_ptr, size_t hash, int depth, bool shared)
+{
+  uint32_t map = subtrie->map;
+  int i = hash & 31;
+  int j = i == 0 ? 0 : count_one_bits (map << (32 - i));
+  if (map & (1 << i))
+    {
+      Hamt_entry *entry = subtrie->nodes[j];
+      Hamt_entry *new_entry
+        = entry_remove (functions, entry, elt_ptr, hash >> 5, depth + 1,
+                        shared);
+      if (new_entry == NULL)
+        return remove_subtrie_entry (subtrie, i, j);
+      if (new_entry != entry)
+        {
+          if (shared)
+            return (Hamt_entry *) replace_entry (subtrie, j, new_entry);
+          free_entry (functions, entry);
+          subtrie->nodes[j] = new_entry;
+        }
+      return (Hamt_entry *) subtrie;
+    }
+  *elt_ptr = NULL;
+  return (Hamt_entry *) subtrie;
+}
+
+/* Remove an element in an entry if found.
+
+   SHARED is false if a destructive update has been requested and none
+   of the parent nodes are shared.  If an entry cannot be
+   removed, *ELT_PTR is set to NULL.  */
+static Hamt_entry *
+entry_remove (const struct function_table *functions, Hamt_entry *entry,
+              Hamt_entry **elt_ptr, size_t hash, int depth, bool shared)
+{
+  shared |= is_shared (entry);
+  switch (entry_type (entry))
+    {
+    case element_entry:
+      if (compare_elements (functions, *elt_ptr, entry))
+        {
+          *elt_ptr = entry;
+          return NULL;
+        }
+      *elt_ptr = NULL;
+      return entry;
+    case subtrie_entry:
+      return subtrie_remove (functions, (struct subtrie *) entry, elt_ptr, hash,
+                             depth, shared);
+    case bucket_entry:
+      return bucket_remove (functions, (struct bucket *) entry, elt_ptr);
+    default:
+      assume (0);
+    }
+}
+
+/* Remove an element in the root.  */
+static Hamt_entry *
+root_remove (const struct function_table *functions, Hamt_entry *root,
+             Hamt_entry **elt_ptr, bool shared)
+{
+  if (root == NULL)
+    return NULL;
+
+  return entry_remove (functions, root, elt_ptr,
+                       hash_element (functions, *elt_ptr), 0, shared);
+}
+
+/* If *ELT_PTR matches an element already in HAMT, set *ELT_PTR to the
+element from the table, remove the element from a copy of the hamt and
+return the copy.  Otherwise, return HAMT.  */
+Hamt *
+hamt_remove (Hamt *hamt, Hamt_entry **elt_ptr)
+{
+  Hamt_entry *elt = *elt_ptr;
+  Hamt_entry *new_entry = root_remove (hamt->functions, hamt->root, elt_ptr,
+                                       true);
+  if (*elt_ptr == NULL)
+    *elt_ptr = elt;
+
+  if (new_entry == hamt->root)
+    return hamt;
+
+  Hamt *new_hamt = XMALLOC (Hamt);
+  new_hamt->functions = copy_function_table (hamt->functions);
+  new_hamt->root = new_entry;
+  return new_hamt;
+}
+
+/*************/
+/* Iteration */
+/*************/
+
+/* Walk a bucket.  */
+static size_t
+bucket_do_while (const struct bucket *bucket, Hamt_processor *proc, void *data,
+                 bool *success)
+{
+  size_t cnt = 0;
+  size_t elt_count = bucket->elt_count;
+  Hamt_entry *const *elts = bucket->elts;
+  for (size_t i = 0; i < elt_count; ++i)
+    {
+      *success = proc (elts[i], data);
+      if (!success)
+        return cnt;
+      ++cnt;
+    }
+  return cnt;
+}
+
+/* Forward declaration.  */
+static size_t entry_do_while (Hamt_entry *entry, Hamt_processor *proc,
+                              void *data, bool *success);
+
+/* Walk a subtrie.  */
+static size_t subtrie_do_while (const struct subtrie *subtrie,
+                                Hamt_processor *proc, void *data, bool *success)
+{
+  size_t cnt = 0;
+  int n = trienode_count (subtrie);
+  Hamt_entry *const *node_ptr = subtrie->nodes;
+  for (int j = 0; j < n; ++j)
+    {
+      cnt += entry_do_while (*node_ptr++, proc, data, success);
+      if (!success)
+        return cnt;
+    }
+  return cnt;
+}
+
+/* Walk an entry.  */
+static size_t
+entry_do_while (Hamt_entry *entry, Hamt_processor *proc, void *data,
+                bool *success)
+{
+  switch (entry_type (entry))
+    {
+    case element_entry:
+      *success = proc (entry, data);
+      return *success ? 1 : 0;
+    case subtrie_entry:
+      return subtrie_do_while ((struct subtrie *) entry, proc, data, success);
+    case bucket_entry:
+      return bucket_do_while ((struct bucket *) entry, proc, data, success);
+    default:
+      assume (0);
+    }
+}
+
+/* Call PROC for every entry of the hamt until it returns false.  The
+   first argument of PROC is the entry, the second argument is the value
+   of DATA as received.  Return the number of calls that returned
+   true.  */
+size_t
+hamt_do_while (const Hamt *hamt, Hamt_processor *proc, void *data)
+{
+  if (hamt->root == NULL)
+    return 0;
+
+  bool success = true;
+  return entry_do_while (hamt->root, proc, data, &success);
+}
+
+/* Create an iterator with a copy of the hamt.
+
+   For a valid iterator state the following is true: If DEPTH is
+   negative, the iterator is exhausted.  Otherwise, ENTRY[DEPTH] is
+   either the element that is produced next, or a bucket such that the
+   element at POSITION of the bucket is produced next.
+*/
+Hamt_iterator
+hamt_iterator (Hamt *hamt)
+{
+  Hamt_iterator iter;
+  iter.hamt = hamt_copy (hamt);
+  Hamt_entry *entry = hamt->root;
+  if (entry == NULL)
+    {
+      iter.depth = -1;
+      return iter;
+    }
+  iter.depth = 0;
+  iter.path = 0;
+  iter.position = 0;
+  while (iter.entry[iter.depth] = entry, entry_type (entry) == subtrie_entry)
+    {
+      const struct subtrie *subtrie = (const struct subtrie *) entry;
+      ++iter.depth;
+      entry = subtrie->nodes[0];
+    }
+  return iter;
+}
+
+/* Free the iterator and the associated hamt copy.  */
+void
+hamt_iterator_free (Hamt_iterator *iter)
+{
+  hamt_free (iter->hamt);
+}
+
+/* Create a copy of the complete iterator state, including the
+   hamt.  */
+Hamt_iterator
+hamt_iterator_copy (Hamt_iterator *iter)
+{
+  Hamt_iterator new_iter = *iter;
+  new_iter.hamt = hamt_copy (iter->hamt);
+  return new_iter;
+}
+
+/* Return the number of significant bits at DEPTH.  */
+static int
+bit_width (int depth)
+{
+  if (depth < _GL_HAMT_MAX_DEPTH - 1)
+    return 5;
+  return SIZE_WIDTH - 5 * (_GL_HAMT_MAX_DEPTH - 1);
+}
+
+/* The actual iteration.  */
+bool
+hamt_iterator_next (Hamt_iterator *iter, Hamt_entry **elt_ptr)
+{
+  int depth = iter->depth;
+  if (depth < 0)
+    return false;
+
+  Hamt_entry *entry = iter->entry[depth];
+  if (entry_type (entry) == bucket_entry)
+    {
+      struct bucket *bucket = (struct bucket *) entry;
+      *elt_ptr = bucket->elts[iter->position];
+      if (++iter->position < bucket->elt_count)
+        return true;
+    }
+  else
+    *elt_ptr = entry;
+
+  struct subtrie *subtrie;
+  while (iter->depth-- > 0)
+    {
+      int width = bit_width (iter->depth);
+      int j = (iter->path & ((1 << width) - 1)) + 1;
+      subtrie = (struct subtrie *) iter->entry[iter->depth];
+      if (j < trienode_count (subtrie))
+        {
+          ++iter->path;
+          while (iter->entry[++iter->depth] = subtrie->nodes[j],
+                 entry_type (iter->entry[iter->depth]) == subtrie_entry)
+            {
+              width = bit_width (iter->depth);
+              iter->path <<= width;
+              j = 0;
+              subtrie = (struct subtrie *) iter->entry[iter->depth];
+            }
+          iter->position = 0;
+          break;
+        }
+      iter->path >>= width;
+    }
+
+  return true;
+}
+
+/***********************/
+/* Destructive Updates */
+/***********************/
+
+/* If *ELT_PTR matches an element already in HAMT, set *ELT_PTR to the
+   element from the table and return false.  Otherwise, insert *ELT_PTR
+   destructively into the hamt and return true.  */
+bool
+hamt_insert_x (Hamt *hamt, Hamt_entry **elt_ptr)
+{
+  Hamt_entry *elt = *elt_ptr;
+  Hamt_entry *old_root = hamt->root;
+  hamt->root = root_insert (hamt->functions, old_root, elt_ptr, false, false);
+  if (old_root != hamt->root && old_root != NULL)
+    free_entry (hamt->functions, old_root);
+  if (*elt_ptr == NULL)
+    {
+      *elt_ptr = elt;
+      return false;
+    }
+  return *elt_ptr == elt;
+}
+
+/* Insert ELT destructively into HAMT.  If an existing element was
+   replaced, return true.  Otherwise, return false.  */
+bool
+hamt_replace_x (Hamt *hamt, Hamt_entry *elt)
+{
+  Hamt_entry *old_root = hamt->root;
+  hamt->root = root_insert (hamt->functions, old_root, &elt, true, false);
+  if (old_root != hamt->root && old_root != NULL)
+    free_entry (hamt->functions, old_root);
+  return elt != NULL;
+}
+
+/* If ELT matches an element already in HAMT, remove the element
+   destructively from the hamt and return true.  Otherwise, return
+   false.  */
+bool
+hamt_remove_x (Hamt *hamt, Hamt_entry *elt)
+{
+  Hamt_entry *old_root = hamt->root;
+  hamt->root = root_remove (hamt->functions, old_root, &elt, false);
+  if (old_root != hamt->root)
+    free_entry (hamt->functions, old_root);
+  return elt != NULL;
+}
diff --git a/lib/hamt.h b/lib/hamt.h
new file mode 100644
index 000000000..55ee964dd
--- /dev/null
+++ b/lib/hamt.h
@@ -0,0 +1,253 @@
+/* (Persistent) hash array mapped tries.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+
+   This program is free software: you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <https://www.gnu.org/licenses/>.  */
+
+/* Written by Marc Nieper-Wißkirchen <marc@nieper-wisskirchen.de>, 2021. */
+
+/* This module provides a persistent version of hash array mapped
+   tries (hamts) that can be used in place of hash tables when pure
+   (functional) operations are needed.
+
+   A hash function and an equivalence predicate has to be provided for
+   the elements that can be inserted, replaced and removed in a hamt.
+   A hamt cannot contain duplicates that compare equal.
+
+   Each non-destructive updating operation returns a new hamt, which
+   shares structure with the original one.  Destructive updates only
+   effect the hamt, on which the destructive operation is applied.
+   For example, given a hamt HAMT1, any non-destructive update
+   operation (e.g. hamt_insert) will result in a new hamt HAMT2.
+   Whatever further operations (destructive or not, including freeing
+   a hamt) are applied to HAMT1 won't change HAMT2 and vice versa.  To
+   free all the memory, hash_free has therefore to be applied to both
+   HAMT1 and HAMT2.
+
+   If persistence is not needed, transient hash tables are probably
+   faster.
+
+   See also: Phil Bagwell (2000). Ideal Hash Trees (Report). Infoscience
+   Department, École Polytechnique Fédérale de Lausanne.
+
+   http://infoscience.epfl.ch/record/64398/files/idealhashtrees.pdf  */
+
+#ifndef _GL_HAMT_H
+#define _GL_HAMT_H
+
+#ifndef _GL_INLINE_HEADER_BEGIN
+# error "Please include config.h first."
+#endif
+_GL_INLINE_HEADER_BEGIN
+#ifndef _GL_HAMT_INLINE
+# define _GL_HAMT_INLINE _GL_INLINE
+#endif
+
+/* The GL_HAMT_THREAD_SAFE flag is set if the implementation of hamts
+   is thread-safe as long as two threads do not simultaneously access
+   the same hamt.  This is non-trivial as different hamts may share
+   some structure.  */
+
+#if (__STDC_VERSION__ < 201112 || defined __STD_NO_ATOMICS__) \
+  && __GNUC__ + (__GNUC_MINOR >= 9) <= 4
+# define GL_HAMT_THREAD_SAFE 0
+#else
+# define GL_HAMT_THREAD_SAFE 1
+#endif
+
+#include <stdbool.h>
+#include <stddef.h>
+#include <stdint.h>
+
+/* Hash values are of type size_t.  For each level of the trie, we use
+   5 bits (corresponding to lg2 of the width of a 32-bit word.  */
+#define _GL_HAMT_MAX_DEPTH ((SIZE_WIDTH + 4) / 5)
+
+/************/
+/* Elements */
+/************/
+
+/* A hamt stores pointers to elements.  Each element has to be a
+   struct whose initial member is of the type Hamt_entry.  An element
+   is conceptually owned by a hamt as soon as it is inserted.  It will
+   be automatically freed as soon as the last hamt containing it is
+   freed.  */
+typedef struct
+{
+#if GL_HAMT_THREAD_SAFE
+  _Atomic
+#endif
+  size_t ref_count;
+} Hamt_entry;
+
+/* Initialize *ELT, which has to point to a structure as described
+   above and return ELT type-cast.
+
+   Before an element is inserted into any hamt, whether once or
+   multiple times, it has to be initialized exactly once.  */
+_GL_HAMT_INLINE Hamt_entry *
+hamt_element (void *elt)
+{
+  Hamt_entry *entry = elt;
+  entry->ref_count = 0;         /* This assumes that element_entry ==
+                                   0.  */
+  return entry;
+}
+
+/*************************/
+/* Opaque Hamt Structure */
+/*************************/
+
+/* In user-code, hamts are accessed through pointers to the opaque
+   Hamt type.  Two hamts are said to be the same if and only if their
+   pointers are equal. */
+typedef struct hamt Hamt;
+
+/*************/
+/* Interface */
+/*************/
+
+/* A hash function has to be pure, and two elements that compare equal
+   have to have the same hash value.  For a hash function to be a good
+   one, it is important that it uses all SIZE_WIDTH bits in
+   size_t.  */
+typedef size_t (Hamt_hasher) (const void *elt);
+
+/* A comparison function has to be pure, and two elements that have
+   equal pointers have to compare equal.  */
+typedef bool (Hamt_comparator) (const void *elt1, const void *elt2);
+
+/* A user-defined function that is called when the last hamt
+   containing a particular element is freed.  */
+typedef void (Hamt_freer) (Hamt_entry *elt);
+
+/****************************/
+/* Creation and Destruction */
+/****************************/
+
+/* Create and return a new and empty hash array mapped trie.  */
+extern Hamt *hamt_create (Hamt_hasher *hasher, Hamt_comparator *comparator,
+                          Hamt_freer *freer)
+  _GL_ATTRIBUTE_NODISCARD;
+
+/* Return a copy of HAMT, which is not the same in the sense above.
+   This procedure can be used, for example, so that two threads can
+   access the same data independently.  */
+extern Hamt *hamt_copy (Hamt *hamt) _GL_ATTRIBUTE_NODISCARD;
+
+/* Free the resources solely allocated by HAMT and all elements solely
+   contained in it.  */
+extern void hamt_free (Hamt *hamt);
+
+/**********/
+/* Lookup */
+/**********/
+
+/* If ELT matches an entry in HAMT, return this entry.  Otherwise,
+   return NULL.  */
+extern Hamt_entry *hamt_lookup (const Hamt *hamt, const void *elt);
+
+/**************************/
+/* Insertion and Deletion */
+/**************************/
+
+/* If *ELT_PTR matches an element already in HAMT, set *ELT_PTR to the
+   existing element and return the original hamt.  Otherwise, insert
+   *ELT_PTR into a copy of the hamt and return the copy.  */
+extern Hamt *hamt_insert (Hamt *hamt, Hamt_entry **elt_ptr)
+  _GL_ATTRIBUTE_NODISCARD;
+
+/* If *ELT_PTR matches an element already in HAMT, set *ELT_PTR to the
+existing element, remove the element from a copy of the hamt and
+return the copy.  Otherwise, return the original hamt.  */
+extern Hamt *hamt_remove (Hamt *hamt, Hamt_entry **elt_ptr)
+  _GL_ATTRIBUTE_NODISCARD;
+
+/* Insert *ELT_PTR into a copy of HAMT and return the copy.  If an
+   existing element was replaced, set *ELT_PTR to this element, and to
+   NULL otherwise.  */
+extern Hamt *hamt_replace (Hamt *hamt, Hamt_entry **elt_ptr)
+  _GL_ATTRIBUTE_NODISCARD;
+
+/*************/
+/* Iteration */
+/*************/
+
+/* A processor function is called during walking of a hamt, which
+   returns true to continue the walking.  */
+typedef bool (Hamt_processor) (Hamt_entry *elt, void *data);
+
+/* Call PROC for every entry of the hamt until it returns false.  The
+   first argument to the processor is the entry, the second argument
+   is the value of DATA as received.  Return the number of calls that
+   returned true.  During processing, the hamt mustn't be
+   modified.  */
+extern size_t hamt_do_while (const Hamt *hamt, Hamt_processor *proc,
+                             void *data);
+
+/* An alternative interface to iterating through the entry of a hamt
+   that does not make use of a higher-order function like
+   hamt_do_while uses the Hamt_iterator type, which can be allocated
+   through automatic variables on the stack.  As a hamt iterator
+   operates on a copy of a hamt, the original hamt can modified
+   (including freeing it) without affecting the iterator.  */
+struct hamt_iterator
+{
+  Hamt* hamt;
+  int depth;
+  size_t path;
+  size_t position;
+  Hamt_entry *entry[_GL_HAMT_MAX_DEPTH + 1];
+};
+typedef struct hamt_iterator Hamt_iterator;
+
+/* Create of copy of HAMT and return an initialized ITER on the
+   copy.  */
+extern Hamt_iterator hamt_iterator (Hamt *hamt);
+
+/* Free the resources allocated for ITER, including the hamt copy
+   associated with it.  */
+extern void hamt_iterator_free (Hamt_iterator *iter);
+
+/* Return an independent copy of ITER that is initially in the same
+   state.  Any operation on the original iterator (including freeing
+   it) doesn't affect the iterator copy and vice versa.  */
+extern Hamt_iterator hamt_iterator_copy (Hamt_iterator *iter);
+
+/* Return true if ITER is not at the end, place the current element in
+   *ELT_PTR and move the iterator forward.  Otherwise, return
+   false.  */
+extern bool hamt_iterator_next (Hamt_iterator *iter,
+                                Hamt_entry **elt_ptr);
+
+/***********************/
+/* Destructive Updates */
+/***********************/
+
+/* If *ELT_PTR matches an element already in HAMT, set *ELT_PTR to the
+   element from the table and return false.  Otherwise, insert *ELT_PTR
+   destructively into the hamt and return true.  */
+extern bool hamt_insert_x (Hamt *hamt, Hamt_entry **elt_ptr);
+
+/* Insert ELT destructively into HAMT.  If an existing element was
+   replaced, return true.  Otherwise, return false.  */
+extern bool hamt_replace_x (Hamt *hamt, Hamt_entry *elt);
+
+/* If ELT matches an element already in HAMT, remove the element
+   destructively from the hamt and return true.  Otherwise, return
+   false.  */
+extern bool hamt_remove_x (Hamt *hamt, Hamt_entry *elt);
+
+_GL_INLINE_HEADER_END
+
+#endif /* _GL_HAMT_H */
diff --git a/modules/hamt b/modules/hamt
new file mode 100644
index 000000000..d73f09c2d
--- /dev/null
+++ b/modules/hamt
@@ -0,0 +1,29 @@
+Description:
+Persistent hash array mapped tries.
+
+Files:
+lib/hamt.h
+lib/hamt.c
+
+Depends-on:
+count-one-bits
+flexmember
+inttypes-incomplete
+stdbool
+stdint
+verify
+xalloc
+
+configure.ac:
+
+Makefile.am:
+lib_SOURCES += hamt.c
+
+Include:
+"hamt.h"
+
+License:
+GPL
+
+Maintainer:
+Marc Nieper-Wisskirchen
diff --git a/modules/hamt-tests b/modules/hamt-tests
new file mode 100644
index 000000000..f4f0ea4e0
--- /dev/null
+++ b/modules/hamt-tests
@@ -0,0 +1,11 @@
+Files:
+tests/test-hamt.c
+tests/macros.h
+
+Depends-on:
+
+configure.ac:
+
+Makefile.am:
+TESTS += test-hamt
+check_PROGRAMS += test-hamt
diff --git a/tests/test-hamt.c b/tests/test-hamt.c
new file mode 100644
index 000000000..0cf6eb5a9
--- /dev/null
+++ b/tests/test-hamt.c
@@ -0,0 +1,378 @@
+/* Test of persistent hash array mapped trie implementation.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+
+   This program is free software: you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <https://www.gnu.org/licenses/>.  */
+
+/* Written by Marc Nieper-Wißkirchen <marc@nieper-wisskirchen.de>, 2021.  */
+
+#include <config.h>
+
+#include "hamt.h"
+#include "macros.h"
+#include "xalloc.h"
+
+typedef struct
+{
+  Hamt_entry entry;
+  int val;
+} Element;
+
+static int
+entry_value (const void *elt)
+{
+  return ((Element *) elt)->val;
+}
+
+static size_t
+hash_element (const void *elt)
+{
+  return entry_value (elt) & ~3; /* We drop the last bits so that we
+                                    can test hash collisions. */
+}
+
+static bool
+compare_element (const void *elt1, const void *elt2)
+{
+  return entry_value (elt1) == entry_value (elt2);
+}
+
+static void
+free_element (Hamt_entry *elt)
+{
+  free (elt);
+}
+
+static Hamt_entry *
+make_element (int n)
+{
+  Element *elt = XMALLOC (Element);
+  elt->val = n;
+  return hamt_element (&elt->entry);
+}
+
+static Hamt *
+test_hamt_create (void)
+{
+  return hamt_create (hash_element, compare_element, free_element);
+}
+
+
+static int sum = 0;
+static int flag;
+
+static bool
+proc (Hamt_entry *elt, void *data)
+{
+  if (data == &flag)
+    {
+      sum += entry_value (elt);
+      return true;
+    }
+  if (sum > 0)
+    {
+      sum = 0;
+      return true;
+    }
+  return false;
+}
+
+static void
+test_general (void)
+{
+  Hamt *hamt = test_hamt_create ();
+
+  Hamt_entry *x5 = make_element (5);
+  Hamt_entry *p = x5;
+  Hamt *hamt1 = hamt_insert (hamt, &p);
+  ASSERT (hamt1 != hamt);
+  ASSERT (hamt_lookup (hamt, x5) == NULL);
+  ASSERT (hamt_lookup (hamt1, x5) == x5);
+  hamt_free (hamt);
+
+  Hamt_entry *y5 = make_element (5);
+  p = y5;
+  Hamt *hamt2 = hamt_insert (hamt1, &p);
+  ASSERT (hamt2 == hamt1);
+  ASSERT (p == x5);
+  ASSERT (hamt_lookup (hamt1, y5) == x5);
+
+  p = y5;
+  hamt = hamt_replace (hamt1, &p);
+  ASSERT (p == x5);
+  ASSERT (hamt_lookup (hamt, y5) == y5);
+  hamt_free (hamt);
+  y5 = make_element (5);
+
+  Hamt_entry *z37 = make_element (37);
+  p = z37;
+  hamt2 = hamt_insert (hamt1, &p);
+  ASSERT (hamt2 != hamt1);
+  ASSERT (p == z37);
+  ASSERT (hamt_lookup (hamt1, z37) == NULL);
+  ASSERT (hamt_lookup (hamt2, z37) == z37);
+  hamt_free (hamt1);
+
+  ASSERT (hamt_lookup (hamt2, x5) == x5);
+  ASSERT (hamt_lookup (hamt2, z37) == z37);
+
+  ASSERT (hamt_do_while (hamt2, proc, &flag) == 2);
+  ASSERT (sum == 42);
+  ASSERT (hamt_do_while (hamt2, proc, NULL) == 1);
+  ASSERT (sum == 0);
+
+  p = y5;
+  hamt1 = hamt_remove (hamt2, &p);
+  ASSERT (hamt1 != hamt2);
+  ASSERT (p == x5);
+
+  ASSERT (hamt_lookup (hamt1, x5) == NULL);
+  ASSERT (hamt_lookup (hamt2, x5) == x5);
+
+  hamt_free (hamt1);
+  Hamt_entry *x4 = make_element (4);
+  hamt1 = hamt_insert (hamt2, &x4);
+  hamt_free (hamt2);
+  Hamt_entry *x6 = make_element (6);
+  hamt2 = hamt_insert (hamt1, &x6);
+  hamt_free (hamt1);
+  ASSERT (hamt_do_while (hamt2, proc, &flag) == 4);
+  ASSERT (sum == 52);
+
+  hamt1 = hamt_remove (hamt2, &x4);
+  sum = 0;
+  ASSERT (hamt_do_while (hamt2, proc, &flag) == 4);
+  ASSERT (sum = 52);
+  sum = 0;
+  ASSERT (hamt_do_while (hamt1, proc, &flag) == 3);
+  ASSERT (sum  = 48);
+
+  hamt_free (hamt1);
+  hamt_free (hamt2);
+  free_element (y5);
+}
+
+static bool
+true_processor (_GL_ATTRIBUTE_MAYBE_UNUSED Hamt_entry *elt,
+                _GL_ATTRIBUTE_MAYBE_UNUSED void *data)
+{
+  return true;
+}
+
+static size_t
+element_count (Hamt *hamt)
+{
+  return hamt_do_while (hamt, true_processor, NULL);
+}
+
+struct find_values_context
+{
+  size_t n;
+  int *elts;
+  bool *found;
+};
+
+static bool
+find_values_processor (Hamt_entry *entry, void *data)
+{
+  struct find_values_context *ctx = data;
+  int val = entry_value (entry);
+  for (size_t i = 0; i < ctx->n; ++i)
+    if (ctx->elts [i] == val && !ctx->found [i])
+      {
+        ctx->found [i] = true;
+        return true;
+      }
+  return false;
+}
+
+static bool
+find_values (Hamt *hamt, size_t n, int *elts)
+{
+  bool *found = XCALLOC (n, bool);
+  struct find_values_context ctx = {n, elts, found};
+  bool res = hamt_do_while (hamt, find_values_processor, &ctx) == n;
+  free (found);
+  return res;
+}
+
+static size_t
+insert_values (Hamt **hamt, size_t n, int *elts, bool destructive)
+{
+  size_t cnt = 0;
+  for (size_t i = 0; i < n; ++i)
+    {
+      Hamt_entry *p = make_element (elts [i]);
+      Hamt_entry *q = p;
+      if (destructive)
+        {
+          if (hamt_insert_x (*hamt, &p))
+            ++cnt;
+          else
+            free_element (q);
+        }
+      else
+        {
+          Hamt *new_hamt = hamt_insert (*hamt, &p);
+          if (new_hamt != *hamt)
+            {
+              hamt_free (*hamt);
+              *hamt = new_hamt;
+              ++cnt;
+            }
+          else
+            {
+              free_element (q);
+            }
+        }
+    }
+  return cnt;
+}
+
+static size_t
+replace_values (Hamt **hamt, size_t n, int *elts, bool destructive)
+{
+  size_t cnt = 0;
+  for (size_t i = 0; i < n; ++i)
+    {
+      Hamt_entry *p = make_element (elts [i]);
+      if (destructive)
+        {
+          if (hamt_replace_x (*hamt, p))
+            ++cnt;
+        }
+      else
+        {
+          Hamt *new_hamt = hamt_replace (*hamt, &p);
+          hamt_free (*hamt);
+          *hamt = new_hamt;
+          if (p != NULL)
+            ++cnt;
+        }
+    }
+  return cnt;
+}
+
+static size_t
+remove_values (Hamt **hamt, size_t n, int *elts, bool destructive)
+{
+  size_t cnt = 0;
+  for (size_t i = 0; i < n; ++i)
+    {
+      Hamt_entry *p = make_element (elts [i]);
+      Hamt_entry *q = p;
+      if (destructive)
+        {
+          if (hamt_remove_x (*hamt, p))
+            ++cnt;
+        }
+      else
+        {
+          Hamt *new_hamt = hamt_remove (*hamt, &p);
+          if (new_hamt != *hamt)
+            {
+              hamt_free (*hamt);
+              *hamt = new_hamt;
+              ++cnt;
+            }
+        }
+      free (q);
+    }
+  return cnt;
+}
+
+static int val_array1 [10] = {1, 2, 3, 4, 33, 34, 35, 36, 1024, 1025};
+static int val_array2 [10] = {1, 2, 34, 36, 1025, 32768, 32769, 32770, 32771,
+                              32772};
+
+static void
+test_functional_update (void)
+{
+  Hamt *hamt = test_hamt_create ();
+
+  ASSERT (insert_values (&hamt, 10, val_array1, false) == 10);
+  ASSERT (element_count (hamt) == 10);
+  ASSERT (find_values (hamt, 10, val_array1));
+  ASSERT (insert_values (&hamt, 10, val_array2, false) == 5);
+  ASSERT (element_count (hamt) == 15);
+  ASSERT (remove_values (&hamt, 10, val_array1, false) == 10);
+  ASSERT (element_count (hamt) == 5);
+  ASSERT (remove_values (&hamt, 10, val_array2, false) == 5);
+  ASSERT (element_count (hamt) == 0);
+
+  ASSERT (replace_values (&hamt, 10, val_array1, false) == 0);
+  ASSERT (element_count (hamt) == 10);
+  ASSERT (find_values (hamt, 10, val_array1));
+  ASSERT (replace_values (&hamt, 10, val_array2, false) == 5);
+  ASSERT (element_count (hamt) == 15);
+
+  hamt_free (hamt);
+}
+
+static void
+test_destructive_update (void)
+{
+  Hamt *hamt = test_hamt_create ();
+
+  ASSERT (insert_values (&hamt, 10, val_array1, true) == 10);
+  ASSERT (element_count (hamt) == 10);
+  ASSERT (find_values (hamt, 10, val_array1));
+  ASSERT (insert_values (&hamt, 10, val_array2, true) == 5);
+  ASSERT (element_count (hamt) == 15);
+  ASSERT (remove_values (&hamt, 10, val_array1, true) == 10);
+  ASSERT (element_count (hamt) == 5);
+  ASSERT (remove_values (&hamt, 10, val_array2, true) == 5);
+  ASSERT (element_count (hamt) == 0);
+
+  ASSERT (replace_values (&hamt, 10, val_array1, true) == 0);
+  ASSERT (element_count (hamt) == 10);
+  ASSERT (find_values (hamt, 10, val_array1));
+  ASSERT (replace_values (&hamt, 10, val_array2, true) == 5);
+  ASSERT (element_count (hamt) == 15);
+
+  hamt_free (hamt);
+}
+
+static void
+test_iterator (void)
+{
+  Hamt *hamt = test_hamt_create ();
+  ASSERT (insert_values (&hamt, 10, val_array1, true) == 10);
+  Hamt_iterator iter [1] = {hamt_iterator (hamt)};
+  size_t cnt = 0;
+  bool found [10] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
+  Hamt_entry *p;
+  while (hamt_iterator_next (iter, &p))
+    {
+      for (size_t i = 0; i < 10; ++i)
+        if (val_array1 [i] == entry_value (p))
+          {
+            ASSERT (!found [i]);
+            found [i] = true;
+            ++cnt;
+            break;
+          }
+    }
+  ASSERT (cnt == 10);
+  hamt_iterator_free (iter);
+  hamt_free (hamt);
+}
+
+int
+main (void)
+{
+  test_general ();
+  test_functional_update ();
+  test_destructive_update ();
+  test_iterator ();
+}
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] hamt: New module.
  2021-04-03  9:25 [PATCH] hamt: New module Marc Nieper-Wißkirchen
@ 2021-04-03 17:02 ` Bruno Haible
  2021-04-03 18:59   ` Marc Nieper-Wißkirchen
  2021-04-05 13:02 ` Bruno Haible
  1 sibling, 1 reply; 9+ messages in thread
From: Bruno Haible @ 2021-04-03 17:02 UTC (permalink / raw)
  To: bug-gnulib; +Cc: Marc Nieper-Wißkirchen

[-- Attachment #1: Type: text/plain, Size: 856 bytes --]

Hi Marc,

> +	hamt: New module.
> +	This module provides (persistent) hash array mapped tries.
> +	* MODULES.html.sh: Add hamt.
> +	* lib/hamt.c: New file.
> +	* lib/hamt.h: New file.
> +	* modules/hamt: New file.
> +	* modules/hamt-tests: New file.
> +	* tests/test-hamt.c: New file.

Would you like to write a bit of documentation for the Gnulib manual
about it? It does not need to copy the extensive comments in hamt.h.
It need only answer the questions:
  - What is a HAMT?
  - When would I (a programmer) make use of it? How does it compare
    to other container data types?
  - What is the Gnulib module name and the include file name?

I think it would fit into the section "Container data types"
https://www.gnu.org/software/gnulib/manual/html_node/Container-data-types.html .

Find attached the start of a diff to add this documentation.

Bruno

[-- Attachment #2: hamt-doc-start.diff --]
[-- Type: text/x-patch, Size: 722 bytes --]

diff --git a/doc/containers.texi b/doc/containers.texi
index 15c915b..6a9b22e 100644
--- a/doc/containers.texi
+++ b/doc/containers.texi
@@ -35,6 +35,9 @@ log
 Gnulib provides several generic container data types.  They can be used
 to organize collections of application-defined objects.
 
+@node Ordinary containers
+@subsection Ordinary container data types
+
 @multitable @columnfractions .15 .5 .1 .1 .15
 @headitem Data type
 @tab Details
@@ -599,6 +602,11 @@ For C++, Gnulib provides a C++ template class for each of these container data t
 @tab @code{"gl_omap.hh"}
 @end multitable
 
+@node Specialized containers
+@subsection Specialized container data types
+
+HAMT ...
+
 @ifnottex
 @unmacro log
 @end ifnottex

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] hamt: New module.
  2021-04-03 17:02 ` Bruno Haible
@ 2021-04-03 18:59   ` Marc Nieper-Wißkirchen
  2021-04-03 19:55     ` Bruno Haible
  0 siblings, 1 reply; 9+ messages in thread
From: Marc Nieper-Wißkirchen @ 2021-04-03 18:59 UTC (permalink / raw)
  To: Bruno Haible; +Cc: Marc Nieper-Wißkirchen, bug-gnulib

[-- Attachment #1: Type: text/plain, Size: 3765 bytes --]

Hi Bruno,

what do you think of the attempt attached below?

Thanks,

Marc

diff --git a/doc/containers.texi b/doc/containers.texi
index 15c915b93..35caf200c 100644
--- a/doc/containers.texi
+++ b/doc/containers.texi
@@ -35,6 +35,9 @@ log
 Gnulib provides several generic container data types.  They can be used
 to organize collections of application-defined objects.

+@node Ordinary containers
+@subsection Ordinary container data types
+
 @multitable @columnfractions .15 .5 .1 .1 .15
 @headitem Data type
 @tab Details
@@ -599,6 +602,46 @@ For C++, Gnulib provides a C++ template class for each
of these container data t
 @tab @code{"gl_omap.hh"}
 @end multitable

+@node Specialized containers
+@subsection Specialized container data types
+
+The @code{hamt} module provides a persistant version of persistent hash
+array mapped tries (HAMTs).  A HAMT is an array mapped trie in which
+elements are stored according to the initial bits of their hash values.
+
+In the current implementation, each inner node of the HAMT can store up
+to @math{32 = 2^5} elements and subtries.  Whenever a collision between
+the initial bits of the hash values of two elements happens, the next
+@math{5} bits of the hash values are examined and the two elements
+pushed down one level in the trie.
+
+HAMTs have the same average access times as hash tables but grow and
+shrink dynamically, so they use memory more economically and do not have
+to be periodically resized.
+
+They were described and analyzed in @cite{Phil Bagwell (2000). Ideal
+   Hash Trees (Report). Infoscience Department, École Polytechnique
+   Fédérale de Lausanne.}
+
+HAMTs are well-suited to a persistent data structure, which means that
+each updating operation (like inserting, replacing, or removing an
+element) returns a new HAMT while leaving the original one intact.  This
+is achieved through structure sharing, which is even safe in the
+presence of multiple threads when the used C compiler supports atomics.
+
+A HAMT can be used whenever an ordinary hash table would be used.  It
+does however, provide non-destructive updating operations without the
+need to copy the whole container  On the other hand, a hash table is
+simpler so that its performance may be better when persistence is not
+needed.
+
+For example, a HAMT can be used to model the dynamic environment in a
+LISP interpreter.  Updating a value in the dynamic environment of one
+continuation frame would not modify values in earlier frames.
+
+To use the module, include @code{hamt.h} in your code.  The public
+interface is documented in that header file.
+
 @ifnottex
 @unmacro log
 @end ifnottex


Am Sa., 3. Apr. 2021 um 19:02 Uhr schrieb Bruno Haible <bruno@clisp.org>:

> Hi Marc,
>
> > +     hamt: New module.
> > +     This module provides (persistent) hash array mapped tries.
> > +     * MODULES.html.sh: Add hamt.
> > +     * lib/hamt.c: New file.
> > +     * lib/hamt.h: New file.
> > +     * modules/hamt: New file.
> > +     * modules/hamt-tests: New file.
> > +     * tests/test-hamt.c: New file.
>
> Would you like to write a bit of documentation for the Gnulib manual
> about it? It does not need to copy the extensive comments in hamt.h.
> It need only answer the questions:
>   - What is a HAMT?
>   - When would I (a programmer) make use of it? How does it compare
>     to other container data types?
>   - What is the Gnulib module name and the include file name?
>
> I think it would fit into the section "Container data types"
>
> https://www.gnu.org/software/gnulib/manual/html_node/Container-data-types.html
> .
>
> Find attached the start of a diff to add this documentation.
>
> Bruno
>

[-- Attachment #2: Type: text/html, Size: 5131 bytes --]

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] hamt: New module.
  2021-04-03 18:59   ` Marc Nieper-Wißkirchen
@ 2021-04-03 19:55     ` Bruno Haible
  2021-04-03 20:32       ` Marc Nieper-Wißkirchen
  0 siblings, 1 reply; 9+ messages in thread
From: Bruno Haible @ 2021-04-03 19:55 UTC (permalink / raw)
  To: Marc Nieper-Wißkirchen; +Cc: bug-gnulib

Hi Marc,

> what do you think of the attempt attached below?

I believe that technical documentation of a feature should be written
to answer the questions in this order:
  1. What is it? (Short summary)
  2. When would I use it? (Use cases)
  3. How do I use it?
  4. Further details.

The idea is to get the reader understand rapidly whether they want to
use the feature or not. That is the purpose of parts 1 and 2.

For part 1:

> +The @code{hamt} module provides a persistant version of persistent hash
> +array mapped tries (HAMTs).

"A persistant version of persistent ..." ?

The term "persistent" is only explained much later.

> A HAMT is an array mapped trie in which
> +elements are stored according to the initial bits of their hash values.

This is a technical detail, to be mentioned later.

I would start in a different way. How about this?

  The @code{hamt} module implements the hash array mapped trie (HAMT)
  data structure.  This is a data structure that contains (key, value)
  pairs (or just plain keys, if no value is needed).  Lookup of a
  (key, value) pair given the key is an O(1) operation, assuming a
  good hash function for the keys is employed.

  The HAMT data structure is useful when you want modifications
  (additions of pairs, removal, value changes) to be visible only to
  some part of your program, whereas other parts of the program
  continue to use the unmodified HAMT.  The HAMT makes this possible
  in a space-efficient manner: the modified and the unmodified HAMT
  share most of their allocated memory.  It is also time-efficient:
  Every such modification is O(1) on average, again assuming a good
  hash function for the keys

For part 2:

> +A HAMT can be used whenever an ordinary hash table would be used.  It
> +does however, provide non-destructive updating operations without the
> +need to copy the whole container  On the other hand, a hash table is
> +simpler so that its performance may be better when persistence is not
> +needed.
> +
> +For example, a HAMT can be used to model the dynamic environment in a
> +LISP interpreter.  Updating a value in the dynamic environment of one
> +continuation frame would not modify values in earlier frames.

This is good; this should come right after part 1.

Now part 3, right after part 2:

> +To use the module, include @code{hamt.h} in your code.  The public
> +interface is documented in that header file.

OK.

And the rest are details that can come after part 3:

> +In the current implementation, each inner node of the HAMT can store up
> +to @math{32 = 2^5} elements and subtries.  Whenever a collision between
> +the initial bits of the hash values of two elements happens, the next
> +@math{5} bits of the hash values are examined and the two elements
> +pushed down one level in the trie.
> +
> +HAMTs have the same average access times as hash tables but grow and
> +shrink dynamically, so they use memory more economically and do not have
> +to be periodically resized.
> +
> +They were described and analyzed in @cite{Phil Bagwell (2000). Ideal
> +   Hash Trees (Report). Infoscience Department, École Polytechnique
> +   Fédérale de Lausanne.}
> +
> +HAMTs are well-suited to a persistent data structure, which means that
> +each updating operation (like inserting, replacing, or removing an
> +element) returns a new HAMT while leaving the original one intact.  This
> +is achieved through structure sharing, which is even safe in the
> +presence of multiple threads when the used C compiler supports atomics.
> +

By the way, I just starred at the Hamt_entry struct and wondered how to
make use of it. The test class shows it. How about extending the comment
a bit:

   Each element has to be a
   struct whose initial member is of the type Hamt_entry.
->
   Each element has to be a
   struct whose initial member is of the type Hamt_entry.  You need to
   define this struct yourself.  It will typically contain a
   Hamt_entry, a key, and optionally a value.

Bruno



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] hamt: New module.
  2021-04-03 19:55     ` Bruno Haible
@ 2021-04-03 20:32       ` Marc Nieper-Wißkirchen
  2021-04-03 21:44         ` Bruno Haible
  0 siblings, 1 reply; 9+ messages in thread
From: Marc Nieper-Wißkirchen @ 2021-04-03 20:32 UTC (permalink / raw)
  To: Bruno Haible; +Cc: Marc Nieper-Wißkirchen, bug-gnulib

[-- Attachment #1: Type: text/plain, Size: 8786 bytes --]

Hi Bruno,

thank you very much for all the helpful feedback. I have incorporated your
suggestions, and you can find an updated diff below.

Thanks,

Marc

diff --git a/doc/containers.texi b/doc/containers.texi
index 15c915b93..9bc1ae43d 100644
--- a/doc/containers.texi
+++ b/doc/containers.texi
@@ -35,6 +35,9 @@ log
 Gnulib provides several generic container data types.  They can be used
 to organize collections of application-defined objects.

+@node Ordinary containers
+@subsection Ordinary container data types
+
 @multitable @columnfractions .15 .5 .1 .1 .15
 @headitem Data type
 @tab Details
@@ -599,6 +602,58 @@ For C++, Gnulib provides a C++ template class for each
of these container data t
 @tab @code{"gl_omap.hh"}
 @end multitable

+@node Specialized containers
+@subsection Specialized container data types
+
+The @code{hamt} module implements the hash array mapped trie (HAMT) data
+structure.  This is a data structure that contains (key, value) pairs.
+Lookup of a (key, value) pair given the key is an O(1) operation,
+assuming a good hash function for the keys is employed.
+
+The HAMT data structure is useful when you want modifications (additions
+of pairs, removal, value changes) to be visible only to some part of
+your program, whereas other parts of the program continue to use the
+unmodified HAMT.  The HAMT makes this possible in a space-efficient
+manner: the modified and the unmodified HAMT share most of their
+allocated memory.  It is also time-efficient: Every such modification
+is O(1) on average, again assuming a good hash function for the keys.
+
+A HAMT can be used whenever an ordinary hash table would be used.  It
+does however, provide non-destructive updating operations without the
+need to copy the whole container  On the other hand, a hash table is
+simpler so that its performance may be better when persistence is not
+needed.
+
+For example, a HAMT can be used to model the dynamic environment in a
+LISP interpreter.  Updating a value in the dynamic environment of one
+continuation frame would not modify values in earlier frames.
+
+To use the module, include @code{hamt.h} in your code.  The public
+interface is documented in that header file.  You have to provide a hash
+function and an equivalence relation, which defines key equality.  The
+module includes a test file @code{test-hamt.c}, which demonstrates how
+the API can be used.
+
+In the current implementation, each inner node of the HAMT can store up
+to @math{32 = 2^5} entries and subtries.  Whenever a collision between
+the initial bits of the hash values of two entries would happen, the
+next @math{5} bits of the hash values are examined and the two entries
+pushed down one level in the trie.
+
+HAMTs have the same average access times as hash tables but grow and
+shrink dynamically, so they use memory more economically and do not have
+to be periodically resized.
+
+They were described and analyzed in @cite{Phil Bagwell (2000). Ideal
+Hash Trees (Report). Infoscience Department, École Polytechnique
+Fédérale de Lausanne.}
+
+HAMTs are well-suited to a persistent data structure, which means that
+each updating operation (like inserting, replacing, or removing an
+element) returns a new HAMT while leaving the original one intact.  This
+is achieved through structure sharing, which is even safe in the
+presence of multiple threads when the used C compiler supports atomics.
+
 @ifnottex
 @unmacro log
 @end ifnottex
diff --git a/lib/hamt.h b/lib/hamt.h
index 55ee964dd..25a0ad9f9 100644
--- a/lib/hamt.h
+++ b/lib/hamt.h
@@ -78,10 +78,13 @@ _GL_INLINE_HEADER_BEGIN
 /************/

 /* A hamt stores pointers to elements.  Each element has to be a
-   struct whose initial member is of the type Hamt_entry.  An element
-   is conceptually owned by a hamt as soon as it is inserted.  It will
-   be automatically freed as soon as the last hamt containing it is
-   freed.  */
+   struct whose initial member is of the type Hamt_entry.  You need to
+   define this struct yourself.  It will typically contain an
+   Hamt_entry, a key, and, optionally, a value.
+
+   An element is conceptually owned by a hamt as soon as it is
+   inserted.  It will be automatically freed as soon as the last hamt
+   containing it is freed.  */
 typedef struct
 {
 #if GL_HAMT_THREAD_SAFE

Am Sa., 3. Apr. 2021 um 21:55 Uhr schrieb Bruno Haible <bruno@clisp.org>:

> Hi Marc,
>
> > what do you think of the attempt attached below?
>
> I believe that technical documentation of a feature should be written
> to answer the questions in this order:
>   1. What is it? (Short summary)
>   2. When would I use it? (Use cases)
>   3. How do I use it?
>   4. Further details.
>
> The idea is to get the reader understand rapidly whether they want to
> use the feature or not. That is the purpose of parts 1 and 2.
>
> For part 1:
>
> > +The @code{hamt} module provides a persistant version of persistent hash
> > +array mapped tries (HAMTs).
>
> "A persistant version of persistent ..." ?
>
> The term "persistent" is only explained much later.
>
> > A HAMT is an array mapped trie in which
> > +elements are stored according to the initial bits of their hash values.
>
> This is a technical detail, to be mentioned later.
>
> I would start in a different way. How about this?
>
>   The @code{hamt} module implements the hash array mapped trie (HAMT)
>   data structure.  This is a data structure that contains (key, value)
>   pairs (or just plain keys, if no value is needed).  Lookup of a
>   (key, value) pair given the key is an O(1) operation, assuming a
>   good hash function for the keys is employed.
>
>   The HAMT data structure is useful when you want modifications
>   (additions of pairs, removal, value changes) to be visible only to
>   some part of your program, whereas other parts of the program
>   continue to use the unmodified HAMT.  The HAMT makes this possible
>   in a space-efficient manner: the modified and the unmodified HAMT
>   share most of their allocated memory.  It is also time-efficient:
>   Every such modification is O(1) on average, again assuming a good
>   hash function for the keys
>
> For part 2:
>
> > +A HAMT can be used whenever an ordinary hash table would be used.  It
> > +does however, provide non-destructive updating operations without the
> > +need to copy the whole container  On the other hand, a hash table is
> > +simpler so that its performance may be better when persistence is not
> > +needed.
> > +
> > +For example, a HAMT can be used to model the dynamic environment in a
> > +LISP interpreter.  Updating a value in the dynamic environment of one
> > +continuation frame would not modify values in earlier frames.
>
> This is good; this should come right after part 1.
>
> Now part 3, right after part 2:
>
> > +To use the module, include @code{hamt.h} in your code.  The public
> > +interface is documented in that header file.
>
> OK.
>
> And the rest are details that can come after part 3:
>
> > +In the current implementation, each inner node of the HAMT can store up
> > +to @math{32 = 2^5} elements and subtries.  Whenever a collision between
> > +the initial bits of the hash values of two elements happens, the next
> > +@math{5} bits of the hash values are examined and the two elements
> > +pushed down one level in the trie.
> > +
> > +HAMTs have the same average access times as hash tables but grow and
> > +shrink dynamically, so they use memory more economically and do not have
> > +to be periodically resized.
> > +
> > +They were described and analyzed in @cite{Phil Bagwell (2000). Ideal
> > +   Hash Trees (Report). Infoscience Department, École Polytechnique
> > +   Fédérale de Lausanne.}
> > +
> > +HAMTs are well-suited to a persistent data structure, which means that
> > +each updating operation (like inserting, replacing, or removing an
> > +element) returns a new HAMT while leaving the original one intact.  This
> > +is achieved through structure sharing, which is even safe in the
> > +presence of multiple threads when the used C compiler supports atomics.
> > +
>
> By the way, I just starred at the Hamt_entry struct and wondered how to
> make use of it. The test class shows it. How about extending the comment
> a bit:
>
>    Each element has to be a
>    struct whose initial member is of the type Hamt_entry.
> ->
>    Each element has to be a
>    struct whose initial member is of the type Hamt_entry.  You need to
>    define this struct yourself.  It will typically contain a
>    Hamt_entry, a key, and optionally a value.
>
> Bruno
>
>

[-- Attachment #2: Type: text/html, Size: 10279 bytes --]

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] hamt: New module.
  2021-04-03 20:32       ` Marc Nieper-Wißkirchen
@ 2021-04-03 21:44         ` Bruno Haible
  0 siblings, 0 replies; 9+ messages in thread
From: Bruno Haible @ 2021-04-03 21:44 UTC (permalink / raw)
  To: Marc Nieper-Wißkirchen; +Cc: bug-gnulib

Hi Marc,

> thank you very much for all the helpful feedback. I have incorporated your
> suggestions, and you can find an updated diff below.

This is definitely better.

> +Lookup of a (key, value) pair given the key is an O(1) operation,

I forgot to say "on average" here. (It's not worst-case O(1).)

> On the other hand, a hash table is
> +simpler so that its performance may be better when persistence is not
> +needed.

The term "persistence" has not been explained at this point.

> +HAMTs are well-suited to a persistent data structure, which means that
> +each updating operation (like inserting, replacing, or removing an
> +element) returns a new HAMT while leaving the original one intact.

Here comes the definition of persistence. Do you want to move it upwards,
or is the talk about persistence redundant, now that we explain the
use-case in part 2?

Feel free to push. It's your module.

Bruno



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] hamt: New module.
  2021-04-03  9:25 [PATCH] hamt: New module Marc Nieper-Wißkirchen
  2021-04-03 17:02 ` Bruno Haible
@ 2021-04-05 13:02 ` Bruno Haible
  2021-04-05 13:16   ` Marc Nieper-Wißkirchen
  1 sibling, 1 reply; 9+ messages in thread
From: Bruno Haible @ 2021-04-05 13:02 UTC (permalink / raw)
  To: bug-gnulib; +Cc: Marc Nieper-Wißkirchen

Hi Marc,

> * tests/test-hamt.c: New file.

Can you please have a quick look whether these Coverity findings (from our
weekly Coverity run) are relevant?


________________________________________________________________________________________________________
*** CID 1503613:  Null pointer dereferences  (REVERSE_INULL)
/gllib/hamt.c: 876 in bucket_do_while()
870       size_t cnt = 0;
871       size_t elt_count = bucket->elt_count;
872       Hamt_entry *const *elts = bucket->elts;
873       for (size_t i = 0; i < elt_count; ++i)
874         {
875           *success = proc (elts[i], data);
>>>     CID 1503613:  Null pointer dereferences  (REVERSE_INULL)
>>>     Null-checking "success" suggests that it may be null, but it has already been dereferenced on all paths leading to the check.
876           if (!success)
877             return cnt;
878           ++cnt;
879         }
880       return cnt;
881     }


________________________________________________________________________________________________________
*** CID 1503612:  Uninitialized variables  (UNINIT)
/gllib/hamt.c: 952 in hamt_iterator()
946       Hamt_iterator iter;
947       iter.hamt = hamt_copy (hamt);
948       Hamt_entry *entry = hamt->root;
949       if (entry == NULL)
950         {
951           iter.depth = -1;
>>>     CID 1503612:  Uninitialized variables  (UNINIT)
>>>     Using uninitialized value "iter". Field "iter.path" is uninitialized.
952           return iter;
953         }
954       iter.depth = 0;
955       iter.path = 0;
956       iter.position = 0;
957       while (iter.entry[iter.depth] = entry, entry_type (entry) == subtrie_entry)


________________________________________________________________________________________________________
*** CID 1503618:  Incorrect expression  (PW.ASSIGN_WHERE_COMPARE_MEANT)
/gltests/test-hamt.c: 155 in ()
149       ASSERT (hamt_do_while (hamt2, proc, &flag) == 4);
150       ASSERT (sum == 52);
151     
152       hamt1 = hamt_remove (hamt2, &x4);
153       sum = 0;
154       ASSERT (hamt_do_while (hamt2, proc, &flag) == 4);
>>>     CID 1503618:  Incorrect expression  (PW.ASSIGN_WHERE_COMPARE_MEANT)
>>>     use of "=" where "==" may have been intended
155       ASSERT (sum = 52);
156       sum = 0;
157       ASSERT (hamt_do_while (hamt1, proc, &flag) == 3);
158       ASSERT (sum  = 48);
159     
160       hamt_free (hamt1);



________________________________________________________________________________________________________
*** CID 1503615:  Incorrect expression  (PW.ASSIGN_WHERE_COMPARE_MEANT)
/gltests/test-hamt.c: 158 in ()
152       hamt1 = hamt_remove (hamt2, &x4);
153       sum = 0;
154       ASSERT (hamt_do_while (hamt2, proc, &flag) == 4);
155       ASSERT (sum = 52);
156       sum = 0;
157       ASSERT (hamt_do_while (hamt1, proc, &flag) == 3);
>>>     CID 1503615:  Incorrect expression  (PW.ASSIGN_WHERE_COMPARE_MEANT)
>>>     use of "=" where "==" may have been intended
158       ASSERT (sum  = 48);
159     
160       hamt_free (hamt1);
161       hamt_free (hamt2);
162       free_element (y5);
163     }


________________________________________________________________________________________________________
*** CID 1503614:    (ASSERT_SIDE_EFFECT)
/gltests/test-hamt.c: 158 in test_general()
152       hamt1 = hamt_remove (hamt2, &x4);
153       sum = 0;
154       ASSERT (hamt_do_while (hamt2, proc, &flag) == 4);
155       ASSERT (sum = 52);
156       sum = 0;
157       ASSERT (hamt_do_while (hamt1, proc, &flag) == 3);
>>>     CID 1503614:    (ASSERT_SIDE_EFFECT)
>>>     Assignment "sum = 48" has a side effect.  This code will work differently in a non-debug build.
158       ASSERT (sum  = 48);
159     
160       hamt_free (hamt1);
161       hamt_free (hamt2);
162       free_element (y5);
163     }
/gltests/test-hamt.c: 155 in test_general()
149       ASSERT (hamt_do_while (hamt2, proc, &flag) == 4);
150       ASSERT (sum == 52);
151     
152       hamt1 = hamt_remove (hamt2, &x4);
153       sum = 0;
154       ASSERT (hamt_do_while (hamt2, proc, &flag) == 4);
>>>     CID 1503614:    (ASSERT_SIDE_EFFECT)
>>>     Assignment "sum = 52" has a side effect.  This code will work differently in a non-debug build.
155       ASSERT (sum = 52);
156       sum = 0;
157       ASSERT (hamt_do_while (hamt1, proc, &flag) == 3);
158       ASSERT (sum  = 48);
159     
160       hamt_free (hamt1);



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] hamt: New module.
  2021-04-05 13:02 ` Bruno Haible
@ 2021-04-05 13:16   ` Marc Nieper-Wißkirchen
  2021-04-05 15:02     ` static analyzers Bruno Haible
  0 siblings, 1 reply; 9+ messages in thread
From: Marc Nieper-Wißkirchen @ 2021-04-05 13:16 UTC (permalink / raw)
  To: Bruno Haible; +Cc: Marc Nieper-Wißkirchen, bug-gnulib

[-- Attachment #1: Type: text/plain, Size: 5028 bytes --]

Hi Bruno,

Am Mo., 5. Apr. 2021 um 15:02 Uhr schrieb Bruno Haible <bruno@clisp.org>:

> Hi Marc,
>
> > * tests/test-hamt.c: New file.
>
> Can you please have a quick look whether these Coverity findings (from our
> weekly Coverity run) are relevant?
>

Coverity seems to be a good tool. I haven't yet tested GCC's new static
analyzer.


>
> ________________________________________________________________________________________________________
> *** CID 1503613:  Null pointer dereferences  (REVERSE_INULL)
> /gllib/hamt.c: 876 in bucket_do_while()
> 870       size_t cnt = 0;
> 871       size_t elt_count = bucket->elt_count;
> 872       Hamt_entry *const *elts = bucket->elts;
> 873       for (size_t i = 0; i < elt_count; ++i)
> 874         {
> 875           *success = proc (elts[i], data);
> >>>     CID 1503613:  Null pointer dereferences  (REVERSE_INULL)
> >>>     Null-checking "success" suggests that it may be null, but it has
> already been dereferenced on all paths leading to the check.
> 876           if (!success)
> 877             return cnt;
> 878           ++cnt;
> 879         }
> 880       return cnt;
> 881     }
>

Correct finding. It should be "!*success".


>
> ________________________________________________________________________________________________________
> *** CID 1503612:  Uninitialized variables  (UNINIT)
> /gllib/hamt.c: 952 in hamt_iterator()
> 946       Hamt_iterator iter;
> 947       iter.hamt = hamt_copy (hamt);
> 948       Hamt_entry *entry = hamt->root;
> 949       if (entry == NULL)
> 950         {
> 951           iter.depth = -1;
> >>>     CID 1503612:  Uninitialized variables  (UNINIT)
> >>>     Using uninitialized value "iter". Field "iter.path" is
> uninitialized.
> 952           return iter;
> 953         }
> 954       iter.depth = 0;
> 955       iter.path = 0;
> 956       iter.position = 0;
> 957       while (iter.entry[iter.depth] = entry, entry_type (entry) ==
> subtrie_entry)
>
>
Irrelevant, but I guess the warning can be silenced by moving up the
initialization of "iter.path" and "iter.position".


>
> ________________________________________________________________________________________________________
> *** CID 1503618:  Incorrect expression  (PW.ASSIGN_WHERE_COMPARE_MEANT)
> /gltests/test-hamt.c: 155 in ()
> 149       ASSERT (hamt_do_while (hamt2, proc, &flag) == 4);
> 150       ASSERT (sum == 52);
> 151
> 152       hamt1 = hamt_remove (hamt2, &x4);
> 153       sum = 0;
> 154       ASSERT (hamt_do_while (hamt2, proc, &flag) == 4);
> >>>     CID 1503618:  Incorrect expression  (PW.ASSIGN_WHERE_COMPARE_MEANT)
> >>>     use of "=" where "==" may have been intended
> 155       ASSERT (sum = 52);
> 156       sum = 0;
> 157       ASSERT (hamt_do_while (hamt1, proc, &flag) == 3);
> 158       ASSERT (sum  = 48);
> 159
> 160       hamt_free (hamt1);
>
>
Correct observation.


>
> ________________________________________________________________________________________________________
> *** CID 1503615:  Incorrect expression  (PW.ASSIGN_WHERE_COMPARE_MEANT)
> /gltests/test-hamt.c: 158 in ()
> 152       hamt1 = hamt_remove (hamt2, &x4);
> 153       sum = 0;
> 154       ASSERT (hamt_do_while (hamt2, proc, &flag) == 4);
> 155       ASSERT (sum = 52);
> 156       sum = 0;
> 157       ASSERT (hamt_do_while (hamt1, proc, &flag) == 3);
> >>>     CID 1503615:  Incorrect expression  (PW.ASSIGN_WHERE_COMPARE_MEANT)
> >>>     use of "=" where "==" may have been intended
> 158       ASSERT (sum  = 48);
> 159
> 160       hamt_free (hamt1);
> 161       hamt_free (hamt2);
> 162       free_element (y5);
> 163     }
>

Dito.


>
> ________________________________________________________________________________________________________
> *** CID 1503614:    (ASSERT_SIDE_EFFECT)
> /gltests/test-hamt.c: 158 in test_general()
> 152       hamt1 = hamt_remove (hamt2, &x4);
> 153       sum = 0;
> 154       ASSERT (hamt_do_while (hamt2, proc, &flag) == 4);
> 155       ASSERT (sum = 52);
> 156       sum = 0;
> 157       ASSERT (hamt_do_while (hamt1, proc, &flag) == 3);
> >>>     CID 1503614:    (ASSERT_SIDE_EFFECT)
> >>>     Assignment "sum = 48" has a side effect.  This code will work
> differently in a non-debug build.
> 158       ASSERT (sum  = 48);
> 159
> 160       hamt_free (hamt1);
> 161       hamt_free (hamt2);
> 162       free_element (y5);
> 163     }
> /gltests/test-hamt.c: 155 in test_general()
> 149       ASSERT (hamt_do_while (hamt2, proc, &flag) == 4);
> 150       ASSERT (sum == 52);
> 151
> 152       hamt1 = hamt_remove (hamt2, &x4);
> 153       sum = 0;
> 154       ASSERT (hamt_do_while (hamt2, proc, &flag) == 4);
> >>>     CID 1503614:    (ASSERT_SIDE_EFFECT)
> >>>     Assignment "sum = 52" has a side effect.  This code will work
> differently in a non-debug build.
> 155       ASSERT (sum = 52);
> 156       sum = 0;
> 157       ASSERT (hamt_do_while (hamt1, proc, &flag) == 3);
> 158       ASSERT (sum  = 48);
> 159
> 160       hamt_free (hamt1);
>
>
These are aftereffects.

I will push the fixes.

Thanks,

Marc

[-- Attachment #2: Type: text/html, Size: 8119 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: static analyzers
  2021-04-05 13:16   ` Marc Nieper-Wißkirchen
@ 2021-04-05 15:02     ` Bruno Haible
  0 siblings, 0 replies; 9+ messages in thread
From: Bruno Haible @ 2021-04-05 15:02 UTC (permalink / raw)
  To: Marc Nieper-Wißkirchen; +Cc: bug-gnulib

Marc Nieper-Wißkirchen wrote:
> Coverity seems to be a good tool.

Yes, it has found a number of mistakes in Gnulib code (handle leaks,
memory leaks, use-after-free bugs, invalid free()), partially in really
complex code that a human cannot easily review.

> I haven't yet tested GCC's new static analyzer.

In GCC 10, the static analyzer has so many false positives that, on a
codebase as mature a gnulib, it was a waste of time to use it. Let's
see how it evolves in future GCC versions. It may be reasonable on
first-year students' code, though — I haven't tried that.

Bruno



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-04-05 15:03 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-03  9:25 [PATCH] hamt: New module Marc Nieper-Wißkirchen
2021-04-03 17:02 ` Bruno Haible
2021-04-03 18:59   ` Marc Nieper-Wißkirchen
2021-04-03 19:55     ` Bruno Haible
2021-04-03 20:32       ` Marc Nieper-Wißkirchen
2021-04-03 21:44         ` Bruno Haible
2021-04-05 13:02 ` Bruno Haible
2021-04-05 13:16   ` Marc Nieper-Wißkirchen
2021-04-05 15:02     ` static analyzers Bruno Haible

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).