From: "Marc Nieper-Wißkirchen" <marc.nieper+gnu@gmail.com>
To: Bruno Haible <bruno@clisp.org>
Cc: "Marc Nieper-Wißkirchen" <marc.nieper+gnu@gmail.com>, bug-gnulib@gnu.org
Subject: Re: [PATCH] hamt: New module.
Date: Sat, 3 Apr 2021 22:32:22 +0200 [thread overview]
Message-ID: <CAEYrNrRPKTvktjYrQ5UbqR5Y8JABp1QXxbDhTTPukWDzxY6oRA@mail.gmail.com> (raw)
In-Reply-To: <1796868.jvDaOvkGpR@omega>
[-- Attachment #1: Type: text/plain, Size: 8786 bytes --]
Hi Bruno,
thank you very much for all the helpful feedback. I have incorporated your
suggestions, and you can find an updated diff below.
Thanks,
Marc
diff --git a/doc/containers.texi b/doc/containers.texi
index 15c915b93..9bc1ae43d 100644
--- a/doc/containers.texi
+++ b/doc/containers.texi
@@ -35,6 +35,9 @@ log
Gnulib provides several generic container data types. They can be used
to organize collections of application-defined objects.
+@node Ordinary containers
+@subsection Ordinary container data types
+
@multitable @columnfractions .15 .5 .1 .1 .15
@headitem Data type
@tab Details
@@ -599,6 +602,58 @@ For C++, Gnulib provides a C++ template class for each
of these container data t
@tab @code{"gl_omap.hh"}
@end multitable
+@node Specialized containers
+@subsection Specialized container data types
+
+The @code{hamt} module implements the hash array mapped trie (HAMT) data
+structure. This is a data structure that contains (key, value) pairs.
+Lookup of a (key, value) pair given the key is an O(1) operation,
+assuming a good hash function for the keys is employed.
+
+The HAMT data structure is useful when you want modifications (additions
+of pairs, removal, value changes) to be visible only to some part of
+your program, whereas other parts of the program continue to use the
+unmodified HAMT. The HAMT makes this possible in a space-efficient
+manner: the modified and the unmodified HAMT share most of their
+allocated memory. It is also time-efficient: Every such modification
+is O(1) on average, again assuming a good hash function for the keys.
+
+A HAMT can be used whenever an ordinary hash table would be used. It
+does however, provide non-destructive updating operations without the
+need to copy the whole container On the other hand, a hash table is
+simpler so that its performance may be better when persistence is not
+needed.
+
+For example, a HAMT can be used to model the dynamic environment in a
+LISP interpreter. Updating a value in the dynamic environment of one
+continuation frame would not modify values in earlier frames.
+
+To use the module, include @code{hamt.h} in your code. The public
+interface is documented in that header file. You have to provide a hash
+function and an equivalence relation, which defines key equality. The
+module includes a test file @code{test-hamt.c}, which demonstrates how
+the API can be used.
+
+In the current implementation, each inner node of the HAMT can store up
+to @math{32 = 2^5} entries and subtries. Whenever a collision between
+the initial bits of the hash values of two entries would happen, the
+next @math{5} bits of the hash values are examined and the two entries
+pushed down one level in the trie.
+
+HAMTs have the same average access times as hash tables but grow and
+shrink dynamically, so they use memory more economically and do not have
+to be periodically resized.
+
+They were described and analyzed in @cite{Phil Bagwell (2000). Ideal
+Hash Trees (Report). Infoscience Department, École Polytechnique
+Fédérale de Lausanne.}
+
+HAMTs are well-suited to a persistent data structure, which means that
+each updating operation (like inserting, replacing, or removing an
+element) returns a new HAMT while leaving the original one intact. This
+is achieved through structure sharing, which is even safe in the
+presence of multiple threads when the used C compiler supports atomics.
+
@ifnottex
@unmacro log
@end ifnottex
diff --git a/lib/hamt.h b/lib/hamt.h
index 55ee964dd..25a0ad9f9 100644
--- a/lib/hamt.h
+++ b/lib/hamt.h
@@ -78,10 +78,13 @@ _GL_INLINE_HEADER_BEGIN
/************/
/* A hamt stores pointers to elements. Each element has to be a
- struct whose initial member is of the type Hamt_entry. An element
- is conceptually owned by a hamt as soon as it is inserted. It will
- be automatically freed as soon as the last hamt containing it is
- freed. */
+ struct whose initial member is of the type Hamt_entry. You need to
+ define this struct yourself. It will typically contain an
+ Hamt_entry, a key, and, optionally, a value.
+
+ An element is conceptually owned by a hamt as soon as it is
+ inserted. It will be automatically freed as soon as the last hamt
+ containing it is freed. */
typedef struct
{
#if GL_HAMT_THREAD_SAFE
Am Sa., 3. Apr. 2021 um 21:55 Uhr schrieb Bruno Haible <bruno@clisp.org>:
> Hi Marc,
>
> > what do you think of the attempt attached below?
>
> I believe that technical documentation of a feature should be written
> to answer the questions in this order:
> 1. What is it? (Short summary)
> 2. When would I use it? (Use cases)
> 3. How do I use it?
> 4. Further details.
>
> The idea is to get the reader understand rapidly whether they want to
> use the feature or not. That is the purpose of parts 1 and 2.
>
> For part 1:
>
> > +The @code{hamt} module provides a persistant version of persistent hash
> > +array mapped tries (HAMTs).
>
> "A persistant version of persistent ..." ?
>
> The term "persistent" is only explained much later.
>
> > A HAMT is an array mapped trie in which
> > +elements are stored according to the initial bits of their hash values.
>
> This is a technical detail, to be mentioned later.
>
> I would start in a different way. How about this?
>
> The @code{hamt} module implements the hash array mapped trie (HAMT)
> data structure. This is a data structure that contains (key, value)
> pairs (or just plain keys, if no value is needed). Lookup of a
> (key, value) pair given the key is an O(1) operation, assuming a
> good hash function for the keys is employed.
>
> The HAMT data structure is useful when you want modifications
> (additions of pairs, removal, value changes) to be visible only to
> some part of your program, whereas other parts of the program
> continue to use the unmodified HAMT. The HAMT makes this possible
> in a space-efficient manner: the modified and the unmodified HAMT
> share most of their allocated memory. It is also time-efficient:
> Every such modification is O(1) on average, again assuming a good
> hash function for the keys
>
> For part 2:
>
> > +A HAMT can be used whenever an ordinary hash table would be used. It
> > +does however, provide non-destructive updating operations without the
> > +need to copy the whole container On the other hand, a hash table is
> > +simpler so that its performance may be better when persistence is not
> > +needed.
> > +
> > +For example, a HAMT can be used to model the dynamic environment in a
> > +LISP interpreter. Updating a value in the dynamic environment of one
> > +continuation frame would not modify values in earlier frames.
>
> This is good; this should come right after part 1.
>
> Now part 3, right after part 2:
>
> > +To use the module, include @code{hamt.h} in your code. The public
> > +interface is documented in that header file.
>
> OK.
>
> And the rest are details that can come after part 3:
>
> > +In the current implementation, each inner node of the HAMT can store up
> > +to @math{32 = 2^5} elements and subtries. Whenever a collision between
> > +the initial bits of the hash values of two elements happens, the next
> > +@math{5} bits of the hash values are examined and the two elements
> > +pushed down one level in the trie.
> > +
> > +HAMTs have the same average access times as hash tables but grow and
> > +shrink dynamically, so they use memory more economically and do not have
> > +to be periodically resized.
> > +
> > +They were described and analyzed in @cite{Phil Bagwell (2000). Ideal
> > + Hash Trees (Report). Infoscience Department, École Polytechnique
> > + Fédérale de Lausanne.}
> > +
> > +HAMTs are well-suited to a persistent data structure, which means that
> > +each updating operation (like inserting, replacing, or removing an
> > +element) returns a new HAMT while leaving the original one intact. This
> > +is achieved through structure sharing, which is even safe in the
> > +presence of multiple threads when the used C compiler supports atomics.
> > +
>
> By the way, I just starred at the Hamt_entry struct and wondered how to
> make use of it. The test class shows it. How about extending the comment
> a bit:
>
> Each element has to be a
> struct whose initial member is of the type Hamt_entry.
> ->
> Each element has to be a
> struct whose initial member is of the type Hamt_entry. You need to
> define this struct yourself. It will typically contain a
> Hamt_entry, a key, and optionally a value.
>
> Bruno
>
>
[-- Attachment #2: Type: text/html, Size: 10279 bytes --]
next prev parent reply other threads:[~2021-04-03 20:32 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-04-03 9:25 [PATCH] hamt: New module Marc Nieper-Wißkirchen
2021-04-03 17:02 ` Bruno Haible
2021-04-03 18:59 ` Marc Nieper-Wißkirchen
2021-04-03 19:55 ` Bruno Haible
2021-04-03 20:32 ` Marc Nieper-Wißkirchen [this message]
2021-04-03 21:44 ` Bruno Haible
2021-04-05 13:02 ` Bruno Haible
2021-04-05 13:16 ` Marc Nieper-Wißkirchen
2021-04-05 15:02 ` static analyzers Bruno Haible
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://lists.gnu.org/mailman/listinfo/bug-gnulib
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAEYrNrRPKTvktjYrQ5UbqR5Y8JABp1QXxbDhTTPukWDzxY6oRA@mail.gmail.com \
--to=marc.nieper+gnu@gmail.com \
--cc=bruno@clisp.org \
--cc=bug-gnulib@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).