[RFC] Adding a real HashTable implementation to gnulib

bug-gnulib@gnu.org mirror (unofficial)
 help / color / mirror / Atom feed

* [RFC] Adding a real HashTable implementation to gnulib
@ 2018-11-26 23:16 Darshit Shah
  2018-11-26 23:22 ` Ben Pfaff
  2018-12-02 13:41 ` Bruno Haible
  0 siblings, 2 replies; 15+ messages in thread
From: Darshit Shah @ 2018-11-26 23:16 UTC (permalink / raw)
  To: bug-gnulib

[-- Attachment #1: Type: text/plain, Size: 867 bytes --]

I recently tried to use the hash table implementation in gnulib which resides
in the "hash" module. However, I quickly realised that the hash table in gnulib
seems to be what is otherwise popularly known as a hash set, i.e., it supports
storing and retrieving just values from the structure. 

On the other hand, a hash table is usually expected to have a key->value
mapping that is stored.

Within GNU Wget, we have a fairly portable version of a hash table implemented
which I think would be a good addition for gnulib. What do you think?

If I get a positive response here, I could extract that code and turn it into a
hash table module for gnulib. We should be able to reuse some part of the
existing code in "hash.c" for this purpose as well

-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 894 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] Adding a real HashTable implementation to gnulib
  2018-11-26 23:16 [RFC] Adding a real HashTable implementation to gnulib Darshit Shah
@ 2018-11-26 23:22 ` Ben Pfaff
  2018-11-27  1:02   ` Darshit Shah
  2018-12-02 13:41 ` Bruno Haible
  1 sibling, 1 reply; 15+ messages in thread
From: Ben Pfaff @ 2018-11-26 23:22 UTC (permalink / raw)
  To: bug-gnulib

On Tue, Nov 27, 2018 at 12:16:16AM +0100, Darshit Shah wrote:
> I recently tried to use the hash table implementation in gnulib which resides
> in the "hash" module. However, I quickly realised that the hash table in gnulib
> seems to be what is otherwise popularly known as a hash set, i.e., it supports
> storing and retrieving just values from the structure. 
> 
> On the other hand, a hash table is usually expected to have a key->value
> mapping that is stored.
> 
> Within GNU Wget, we have a fairly portable version of a hash table implemented
> which I think would be a good addition for gnulib. What do you think?
> 
> If I get a positive response here, I could extract that code and turn it into a
> hash table module for gnulib. We should be able to reuse some part of the
> existing code in "hash.c" for this purpose as well

Can you point to the Wget hash table?

I'm pretty fond of the hash table implementation we have in PSPP:
http://git.savannah.gnu.org/cgit/pspp.git/tree/src/libpspp/hmap.h
http://git.savannah.gnu.org/cgit/pspp.git/tree/src/libpspp/hmap.c


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] Adding a real HashTable implementation to gnulib
  2018-11-26 23:22 ` Ben Pfaff
@ 2018-11-27  1:02   ` Darshit Shah
  2018-11-27  4:47     ` Ben Pfaff
  0 siblings, 1 reply; 15+ messages in thread
From: Darshit Shah @ 2018-11-27  1:02 UTC (permalink / raw)
  To: Ben Pfaff; +Cc: bug-gnulib

[-- Attachment #1: Type: text/plain, Size: 2044 bytes --]

Here are the links to the sources in the GNU Wget tree:

http://git.savannah.gnu.org/cgit/wget.git/tree/src/hash.h
http://git.savannah.gnu.org/cgit/wget.git/tree/src/hash.c

At first sight, the implementation in PSPP looks a lot more concise.
Also, it's usage of fewer preprocessor statements makes me already like it.

However, it does seem that this implementation is not as general or, complete
as the existing hash table implementation in gnulib. The version available in
GNU Wget does implement all the same interfaces with very similar usage
characteristics.

One of the things I noticed missing in the implementation in GNU PSPP is the
ability to grow the size of the table based on a certain threshold of
"fullness".

* Ben Pfaff <blp@cs.stanford.edu> [181127 00:41]:
> On Tue, Nov 27, 2018 at 12:16:16AM +0100, Darshit Shah wrote:
> > I recently tried to use the hash table implementation in gnulib which resides
> > in the "hash" module. However, I quickly realised that the hash table in gnulib
> > seems to be what is otherwise popularly known as a hash set, i.e., it supports
> > storing and retrieving just values from the structure. 
> > 
> > On the other hand, a hash table is usually expected to have a key->value
> > mapping that is stored.
> > 
> > Within GNU Wget, we have a fairly portable version of a hash table implemented
> > which I think would be a good addition for gnulib. What do you think?
> > 
> > If I get a positive response here, I could extract that code and turn it into a
> > hash table module for gnulib. We should be able to reuse some part of the
> > existing code in "hash.c" for this purpose as well
> 
> Can you point to the Wget hash table?
> 
> I'm pretty fond of the hash table implementation we have in PSPP:
> http://git.savannah.gnu.org/cgit/pspp.git/tree/src/libpspp/hmap.h
> http://git.savannah.gnu.org/cgit/pspp.git/tree/src/libpspp/hmap.c
> 
> 

-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 894 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] Adding a real HashTable implementation to gnulib
  2018-11-27  1:02   ` Darshit Shah
@ 2018-11-27  4:47     ` Ben Pfaff
  0 siblings, 0 replies; 15+ messages in thread
From: Ben Pfaff @ 2018-11-27  4:47 UTC (permalink / raw)
  To: bug-gnulib

Much as I like the PSPP hmaps, it probably makes sense for any hash
table implementation in gnulib to match the existing code.

On Tue, Nov 27, 2018 at 02:02:17AM +0100, Darshit Shah wrote:
> Here are the links to the sources in the GNU Wget tree:
> 
> http://git.savannah.gnu.org/cgit/wget.git/tree/src/hash.h
> http://git.savannah.gnu.org/cgit/wget.git/tree/src/hash.c
> 
> At first sight, the implementation in PSPP looks a lot more concise.
> Also, it's usage of fewer preprocessor statements makes me already like it.
> 
> However, it does seem that this implementation is not as general or, complete
> as the existing hash table implementation in gnulib. The version available in
> GNU Wget does implement all the same interfaces with very similar usage
> characteristics.
> 
> One of the things I noticed missing in the implementation in GNU PSPP is the
> ability to grow the size of the table based on a certain threshold of
> "fullness".
> 
> * Ben Pfaff <blp@cs.stanford.edu> [181127 00:41]:
> > On Tue, Nov 27, 2018 at 12:16:16AM +0100, Darshit Shah wrote:
> > > I recently tried to use the hash table implementation in gnulib which resides
> > > in the "hash" module. However, I quickly realised that the hash table in gnulib
> > > seems to be what is otherwise popularly known as a hash set, i.e., it supports
> > > storing and retrieving just values from the structure. 
> > > 
> > > On the other hand, a hash table is usually expected to have a key->value
> > > mapping that is stored.
> > > 
> > > Within GNU Wget, we have a fairly portable version of a hash table implemented
> > > which I think would be a good addition for gnulib. What do you think?
> > > 
> > > If I get a positive response here, I could extract that code and turn it into a
> > > hash table module for gnulib. We should be able to reuse some part of the
> > > existing code in "hash.c" for this purpose as well
> > 
> > Can you point to the Wget hash table?
> > 
> > I'm pretty fond of the hash table implementation we have in PSPP:
> > http://git.savannah.gnu.org/cgit/pspp.git/tree/src/libpspp/hmap.h
> > http://git.savannah.gnu.org/cgit/pspp.git/tree/src/libpspp/hmap.c
> > 
> > 
> 
> -- 
> Thanking You,
> Darshit Shah
> PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] Adding a real HashTable implementation to gnulib
  2018-11-26 23:16 [RFC] Adding a real HashTable implementation to gnulib Darshit Shah
  2018-11-26 23:22 ` Ben Pfaff
@ 2018-12-02 13:41 ` Bruno Haible
  2018-12-02 15:05   ` LRN
  2018-12-03  9:13   ` Tim Rühsen
  1 sibling, 2 replies; 15+ messages in thread
From: Bruno Haible @ 2018-12-02 13:41 UTC (permalink / raw)
  To: bug-gnulib; +Cc: Darshit Shah

Hi,

Darshit Shah wrote:
> I recently tried to use the hash table implementation in gnulib which resides
> in the "hash" module. However, I quickly realised that the hash table in gnulib
> seems to be what is otherwise popularly known as a hash set, i.e., it supports
> storing and retrieving just values from the structure. 
> 
> On the other hand, a hash table is usually expected to have a key->value
> mapping that is stored.

I agree that the gnulib 'hash' module is just a particular case, and
probably the module name is not very descriptive.

> Within GNU Wget, we have a fairly portable version of a hash table implemented
> which I think would be a good addition for gnulib. What do you think?

There's not only the one from wget
  https://git.savannah.gnu.org/gitweb/?p=wget.git;a=blob;f=src/hash.h
  https://git.savannah.gnu.org/gitweb/?p=wget.git;a=blob;f=src/hash.c

but also the one from gettext
  https://git.savannah.gnu.org/gitweb/?p=gettext.git;a=blob;f=gnulib-local/lib/hash.h
  https://git.savannah.gnu.org/gitweb/?p=gettext.git;a=blob;f=gnulib-local/lib/hash.c

and the one from glib
  https://gitlab.gnome.org/GNOME/glib/blob/master/glib/ghash.h
  https://gitlab.gnome.org/GNOME/glib/blob/master/glib/ghash.c

and the one from libxml
  https://gitlab.gnome.org/GNOME/libxml2/blob/master/include/libxml/hash.h
  https://gitlab.gnome.org/GNOME/libxml2/blob/master/hash.c

and the ones from CLN
  https://www.ginac.de/CLN/cln.git/?p=cln.git;a=tree;f=src/base/hash

and many more.

The implementation you are proposing is an "open-addressed table with linear
probing collision resolution". To me that is unacceptable. When I used Kyoto
Common Lisp (KCL) many years ago, I got an endless loop during a hash table
access, and it was precisely because of this open-addressed table structure.
I don't want a code which requires careful setting of parameters in order
not to run into an endless loop. Instead better have a code that cannot run
into an endless loop *by design*.

The hash_string function that you propose shifts by 5 bits at each step;
I suspect that it has the same problem as the one I tested and discussed in
https://haible.de/bruno/hashfunc.html .

For Gnulib, I would want a generic container, i.e. a "map", like we have
"list" and "ordered set" already (modules 'list' and 'oset'). Other GNU
maintainers have reported that they like this approach.

However, this will still not fit all possible needs because there are
special cases that people want to see optimized:
  - The case when the key is a string; additionally when the key is
    allocated in an obstack and there is no remove.
  - The struniq function (as in localename.c).

Then, what about extra requirements?
  - The existing gnulib 'hash' module is pretty unique: it keeps statistics.
    But is anyone really using this feature?
  - malloc vs. xmalloc.
  - Multithread-safety should IMO not be considered as an extra requirement.
    This is better done in application logic, because typically in the scope
    of the lock the application will do more than just the hash table lookup.

Bruno



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] Adding a real HashTable implementation to gnulib
  2018-12-02 13:41 ` Bruno Haible
@ 2018-12-02 15:05   ` LRN
  2018-12-03  9:13   ` Tim Rühsen
  1 sibling, 0 replies; 15+ messages in thread
From: LRN @ 2018-12-02 15:05 UTC (permalink / raw)
  To: bug-gnulib


[-- Attachment #1.1: Type: text/plain, Size: 1330 bytes --]

On 02.12.2018 16:41, Bruno Haible wrote:
> Hi,
> 
> Darshit Shah wrote:
>> I recently tried to use the hash table implementation in gnulib which
>> resides in the "hash" module. However, I quickly realised that the hash
>> table in gnulib seems to be what is otherwise popularly known as a hash
>> set, i.e., it supports storing and retrieving just values from the
>> structure.
>> 
>> On the other hand, a hash table is usually expected to have a key->value 
>> mapping that is stored.
> 
> I agree that the gnulib 'hash' module is just a particular case, and 
> probably the module name is not very descriptive.
> 
>> Within GNU Wget, we have a fairly portable version of a hash table
>> implemented which I think would be a good addition for gnulib. What do you
>> think?
> 
> There's not only the one from wget but also the one from gettext and the one
> from glib https://gitlab.gnome.org/GNOME/glib/blob/master/glib/ghash.h 
> https://gitlab.gnome.org/GNOME/glib/blob/master/glib/ghash.c
> 
> and the one from libxml and the ones from CLN and many more.
> 

There was a hashtable shootout[1] recently, with a followup[2] (although that
one is glib-specific):

[1]: https://hpjansson.org/blag/2018/07/24/a-hash-table-re-hash/
[2]: https://hpjansson.org/blag/2018/08/29/what-ails-ghashtable/


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] Adding a real HashTable implementation to gnulib
  2018-12-02 13:41 ` Bruno Haible
  2018-12-02 15:05   ` LRN
@ 2018-12-03  9:13   ` Tim Rühsen
  2018-12-04  2:32     ` Bruno Haible
  1 sibling, 1 reply; 15+ messages in thread
From: Tim Rühsen @ 2018-12-03  9:13 UTC (permalink / raw)
  To: Bruno Haible, bug-gnulib; +Cc: Darshit Shah


[-- Attachment #1.1: Type: text/plain, Size: 4938 bytes --]

On 12/2/18 2:41 PM, Bruno Haible wrote:
> Hi,
> 
> Darshit Shah wrote:
>> I recently tried to use the hash table implementation in gnulib which resides
>> in the "hash" module. However, I quickly realised that the hash table in gnulib
>> seems to be what is otherwise popularly known as a hash set, i.e., it supports
>> storing and retrieving just values from the structure. 
>>
>> On the other hand, a hash table is usually expected to have a key->value
>> mapping that is stored.
> 
> I agree that the gnulib 'hash' module is just a particular case, and
> probably the module name is not very descriptive.
> 
>> Within GNU Wget, we have a fairly portable version of a hash table implemented
>> which I think would be a good addition for gnulib. What do you think?
> 
> There's not only the one from wget
>   https://git.savannah.gnu.org/gitweb/?p=wget.git;a=blob;f=src/hash.h
>   https://git.savannah.gnu.org/gitweb/?p=wget.git;a=blob;f=src/hash.c
> 
> but also the one from gettext
>   https://git.savannah.gnu.org/gitweb/?p=gettext.git;a=blob;f=gnulib-local/lib/hash.h
>   https://git.savannah.gnu.org/gitweb/?p=gettext.git;a=blob;f=gnulib-local/lib/hash.c
> 
> and the one from glib
>   https://gitlab.gnome.org/GNOME/glib/blob/master/glib/ghash.h
>   https://gitlab.gnome.org/GNOME/glib/blob/master/glib/ghash.c
> 
> and the one from libxml
>   https://gitlab.gnome.org/GNOME/libxml2/blob/master/include/libxml/hash.h
>   https://gitlab.gnome.org/GNOME/libxml2/blob/master/hash.c
> 
> and the ones from CLN
>   https://www.ginac.de/CLN/cln.git/?p=cln.git;a=tree;f=src/base/hash
> 
> and many more.
> 
> The implementation you are proposing is an "open-addressed table with linear
> probing collision resolution". To me that is unacceptable. When I used Kyoto
> Common Lisp (KCL) many years ago, I got an endless loop during a hash table
> access, and it was precisely because of this open-addressed table structure.
> I don't want a code which requires careful setting of parameters in order
> not to run into an endless loop. Instead better have a code that cannot run
> into an endless loop *by design*.
> 
> The hash_string function that you propose shifts by 5 bits at each step;
> I suspect that it has the same problem as the one I tested and discussed in
> https://haible.de/bruno/hashfunc.html .
> 
> For Gnulib, I would want a generic container, i.e. a "map", like we have
> "list" and "ordered set" already (modules 'list' and 'oset'). Other GNU
> maintainers have reported that they like this approach.

We have 'hashmap' [1] in GNU Wget2. It does not work with linear probing
collision resolution, but with a linked list as fallback for collisions
(that could be replaced by a different algorithm).

So it's a key/value store where you can have NULL as value - in case you
just want to know if a key exists or not.

You apply your own hash and compare function when you create a hashmap
instance. So it's up to the dev to decide for an optimal hash strategy
for a certain use case.

You can add entries (key+value) with or without malloc/copying.

You can easily make wrappers around hashmap for special purposes, e.g.
we have a 'stringmap' API for case sensitive and insensitive keys [2].

The API is documented at [3] (hashmap) and [4] (stringmap).

It's a naive implementation but fast enough for out purposes (70ms to
read in the german translation of the holy bible (both parts) and count
unique words).

Surely many details have to change, but I am absolutely willing to
accept critics and work together with you - and anyone else who likes -
to get it into shape. The API hasn't been released, so we are currently
pretty flexible with changes.


[1] https://gitlab.com/gnuwget/wget2/blob/master/libwget/hashmap.c
[2] https://gitlab.com/gnuwget/wget2/blob/master/libwget/stringmap.c
[3] https://gnuwget.gitlab.io/wget2/reference/group__libwget-hashmap.html
[4] https://gnuwget.gitlab.io/wget2/reference/group__libwget-stringmap.html

> 
> However, this will still not fit all possible needs because there are
> special cases that people want to see optimized:
>   - The case when the key is a string; additionally when the key is
>     allocated in an obstack and there is no remove.
>   - The struniq function (as in localename.c).
> 
> Then, what about extra requirements?
>   - The existing gnulib 'hash' module is pretty unique: it keeps statistics.
>     But is anyone really using this feature?
>   - malloc vs. xmalloc.
>   - Multithread-safety should IMO not be considered as an extra requirement.
>     This is better done in application logic, because typically in the scope
>     of the lock the application will do more than just the hash table lookup.

There are no stats in Wget2's implementation. The malloc/free functions
are currently not configurable (but that's trivial).

Regards, Tim


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] Adding a real HashTable implementation to gnulib
  2018-12-03  9:13   ` Tim Rühsen
@ 2018-12-04  2:32     ` Bruno Haible
  2018-12-04 19:36       ` Tim Rühsen
                         ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Bruno Haible @ 2018-12-04  2:32 UTC (permalink / raw)
  To: Tim Rühsen; +Cc: bug-gnulib, Darshit Shah

Hi Tim,

> We have 'hashmap' [1] in GNU Wget2.

Things I like about the implementation:

  - Collision resolution through linked list (a robust algorithm).

  - Reasonably generic API.

  - The user-definable destructor functions.

Things that could be improved:

  - The ability to set a growth policy > 0. This leads to quadratic
    run time behaviour. I mean, you make such an effort to have O(1)
    access on average, and then the fixed-size increment of the size
    turns the insertion operation into an average-O(n) operation.

  - Some functions take keysize and valuesize arguments, some don't.
    I'm confused.

  - NULL values are special. The documentation says "If there are NULL
    values in the stringmap, you should use wget_stringmap_get_null() to
    distinguish between 'not found' and 'found NULL value'." I would
    prefer an API that does not require me to think about whether I have
    null values in the map or not.

  - To iterate through the table, you need to define an instance of the
    wget_hashmap_browse_t function. I love functional programming, but
    C is not the right language for that. If the inner part (the body
    of the 'browse' function) needs to access outer variables, the
    programmer needs to manually create a "context" struct and fill it -
    things that the compiler would be doing in other programming languages.
    Some people say that the solution to this problem are the nested
    functions supported by GCC, but
      1. This is not portable; only GCC supports this.
      2. On many CPUs (including x86_64), the use of nested functions
         requires to disable on the entire process a security feature
         (namely, stacks without execute permission).
    I therefore prefer the concept of "iterator" objects that allow
    the programmer to step through the hash table without contorting
    their code and without compromising on security.

Bruno



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] Adding a real HashTable implementation to gnulib
  2018-12-04  2:32     ` Bruno Haible
@ 2018-12-04 19:36       ` Tim Rühsen
  2018-12-04 19:40       ` Tim Rühsen
  2019-01-03 16:32       ` Tim Rühsen
  2 siblings, 0 replies; 15+ messages in thread
From: Tim Rühsen @ 2018-12-04 19:36 UTC (permalink / raw)
  To: Bruno Haible; +Cc: bug-gnulib, Darshit Shah

[-- Attachment #1: PGP/MIME version identification --]
[-- Type: application/pgp-encrypted, Size: 11 bytes --]

[-- Attachment #2: OpenPGP encrypted message --]
[-- Type: application/octet-stream, Size: 6696 bytes --]

-----BEGIN PGP MESSAGE-----

hQIMAxQOqHn5pNEvARAAw76+vWvATjI+03AHdi7YwWvTA7TyOjyLCqmJyHffLWjS
3j+LGgTRoiEHjnyD7LSfEQjvxnMMY+X+CP/cYRCAJmClIRFMxTSBRGkrotfGI7tS
cT5YtaYx1i/HLTiSr3O1mhiFCl2VK5NKWfjpJpoKP/L7r//hV6775+ISWo0mZOrj
RPuIQxcJfGOCcO70wmLd3ZJocOO0F+cgypmze3qLedyYIW5KzHVbSdk8y1DjHrM9
C+hMZmWDjbMh8ml1GG4hvkfFmeeEdoIGV6ilKRcU/IDsQCyg1k5FCzyQ28bWVZja
scr2rVUrtmhhhStmYRd1aI21wulncKdEULrkxZaPRK1J1onzNUxXbNrG4wWJRhIW
5WR7xYreKpzHUjjiqyfQaSd/w1/apIcXZxZE8eiwGwBbOJ7bUmIzFPW/vwvPr2oC
OnUOn3JQfH7Jf68HWjOXsyKTOI3X89iQOpqNvqLmGYoBRIdbiI8BP46lTQqxmlZo
PZZDE+/YK+F9jrh6H6tTPaapuWCp08AUr3fajbjqgdDZcyDiZXJ83O0XH11rV58r
yY3+WuBZAkt81eo8vsG2roDL02jRiAadjy1J6FuftbUEJue757THqXkqtvBUZ8US
Ei81fgSYpCPBLk0c/lqHRK59rXeO4+B/zLlWb+jVZhFI4ha/A+Xlrv9XTStJjCSF
AgwD6SreaCbfFBABD/4tp6cLkHe/EL9KPSOSUk5ZpCwcUzeEqGGUYYZrSVr+rzhD
wtK37VnFWVkBHYf4ZezFb52S94l1Cl/6yE8fUj7UBe6V0MBJ3PTOMVoIb5sxWCIY
XU+zllwvkyqlU7ADuoAT/s2eMUeahbrANUJuVkwCfkaMF6O5Wi8tppgj0XS5sZbj
t5bL4oDz7ivLQslnvYrwTUhTz32Vg3x6YFilcwNKr37kVvdlz95JJH43QsbiJVeM
zo4PRIX7mV0GEJRjAYeCInI1WGI3Z9cZkGZ0AdNSTi4D2JQVDE023jUI/X5ybAz7
RFK5BCzzb+7jzK7iiHP26TvpWVFQi6ugLMLStD4fcNemA7ywn5LaNfm/XAwUx+4L
ZCq0RylTF8XFmY28GWsPOy2zA/B66Sk1CMLj//8W1mX+T2KcjL9k89Yre/5OdJhU
tJ1BDfAP3Qjyl54XytXJkLMZZzvqCM7cvQsIr03fsF4MsmsplKo92xRaYoKRX+aK
zoIP9o1U1dHlVF+CnMrPCB7kdtomoFDtoaonnrFpWLXX0vym9VDyo6KRj7B1k79F
RX4wpKXH5+u+IJnpcXUyB+TpZI189GHE59EXZeWPtbYb+mpFHV6NXLeP60zDH+2K
8cASjRAsHtcgY6NCFLtYzQpJCYdO5SwVxzMqsosyaDCWux2+Ex/IfgbONauCX4UC
DANYg+75Ck/BzwEP/jzs2hPLnX1tiwagCKQXdIPm8ZoO8OFZhy8Rq2kUlQ+ZK2GM
xm/lMq1fLX6OfUXk7ZVj7gyR8qyMt3TNiAPTnXENbje+6VEjGHHHqGgRDxoJLdrt
IfWffdFlWBWXBqtck16BX1uViWLMIm+pQ/meZOZiomPaRCj2TmM3py2fPiSKT4sV
KmTBbUsEMJ2t8edwrPqNXwK1YNzjVw90rebiF5HQ3uXQi44KvHKWGuGlXXBy+L3/
WbiTajc5OYgmYtNYRpTAvNMGsjmi5c+pj906k9AoNHAXFNBx7/wtAxlGCgl6i20T
xqI/euugR/wN7ji14ThI2ZvAdaMFcu9sfAu9DoHrocyh8w2ZdRUKa97C157Mi0PB
2qIkvTOQ7lrCeacQ6DTtuqo5KTz2+HFF9H8TTe7NtydrWd9nXzZiZ9FPsgSEVu6O
OjzstzP/YiDSosfAbJWP2ZLt2sKXN/yhUMdmGJT/fd5F09gLyeHNtqbGyC92boOR
azYuWruUw+wBHY2BODBhPvutrzR79TwkEhZKiQoqdNJrqQ9MhqzkwyJjOkmJi9uw
Dvqw32lgtlT0CJ4Nvrifw4sKNLcHdosOw9+ioiDB2RLC/Sx5V7qEpyKAVMMZDIgs
uaMUnKQpQaALPLNJ/i3QSaszcyBzgMRg90L0ewskKuk2c+3ccZ0HX/92t7b/0usB
UuBqLQIfKphvWvc2T/Aw21SJomahSxtMv2Qr0CZxD9taVsiiRkBfq387+fsg8YmZ
sz4vImDZWVkXJIlq38389T3B8cJIKlLCp3V0/PhLNuC2vgg+CP/Cf4LD+zCr0lSc
xxnVojg4BvDakHORo+/4RyGnonTNJ8Lk4D8yE9rg/fRH4XC00E4wmr7hrrtv4Hsf
Nona/U35ZvfBciAZfa+X1SjCKVkUNDHlXOEzkmeBd/htIUI2J4ekNPVgCVg5Hr+L
u8/eEW9MSs3gKt632VZxEVtdHEzx6bwGAsTeY+rYBjFIRFIRQhbdpiq4zmWG84Tm
StRjrglkinNsSHaSm5KY1Cb6VcqSUKreUgW5y2+bVjFzQaW08CDruVH28tJzQPpu
92AcnIFsZLruiCZgf/a5iayWKcXqAnHJO1yZKCfw4i5eDf+W7n/U5JUoE79gMzF5
ruKKqUUzKFzJsNFQCDUewO1jOW9dOlsaoSZ9ySraeficSveKpWg1EtFkuTnNqTx1
X4yuhfk4WonkeOiBjPfgeuYxsmhf7Wlv3xGnt+9vlITpbjOAB0PLD8Y9k7v1HEbu
dW8Fg/aiROpSulaZEK6Me7R4D5wnpBX1a2AO7XHxAFyzTlBlSSjzNY8C/s+0o5k2
TzffVdA11LdHmK1ywrw1/6GOMa86Sx1pmJdn4KqcGEP1vLdlKUn2jG0EQPA9c6EJ
B82hZR0zA6fUFY09sD6JunAzaHhkoshzXwkXuBLy/yxXibgOhpCUtu1L0QH7iIl2
nq6vJL0gP9l8eN/02YooIDOl2wA8fgRkm0E389DAANLOSBCsXTpVn4rYz/Bcp17R
nL+cGecIs2GS4Vc/urUG624ipBsi2a+rIKpIkAdjRlIdvOWqXEvLdmr4nZ8df35d
FWzCZSb4Fo4GOAx946A57dI5N5Vs10JezMhTF6LMJdPCxIxScfieoDmos/hlKGMQ
Gme10k8orovrHEbxmlyfec6cd0AMIo4PCk2STrRd/QP7Aocy1Fb7hCri3rvTJNms
Fqb07AeTk+uzXIvImuIavHGUlOG6wdhITUJ8OfczUC22mllzg4uQnOLSJFs6z2VC
iCig2GczBpLHt8p5yZgDvjupIGtKHG/XQc4UPiUi5E/adR29+DruQogAtkDq/dc0
0xVkfMNm5kGm0YRTvhQ8QP9VNPaB6jrzDHz8EjGeOCyGsC+5Ysm6vPYwVT7Jkmq8
fJ2VzW862uLlQWF8jVKOQ2CcR8W/YCi3qMjS7iK1VRGsla4Noo7YQ2mGGgacqUPX
wyHF6rscS2iVjIBpbpaajpCHd9qeq2DNPVAf2cBYv3lmJ9O6MzAaSe621iyPmX2i
aXYf8k3vRb0DPTn5v9mE3HmcREisbPDT3DhKXgIKfzKHEUwe1B3FHk78fQB8cspN
4a/b55Ozq174+vyOG+Vh7HYW3yf7WIEC4cpUQzhJ5hmY/q+2bYKhBDGc+7YEiynB
UT6mvFGHaklL8R8uxbbZSFvVeOGdVGLvOIFP8jeQGUyl1nYpg+w/hWbLFJ4IIPHC
JJ265WU3UeSiPcuk7YAnA77CATHU3kBGo5p6ij1Svf9pSPpANDplqe5kofClkP31
zsvIhvYqDOMdpsIWpmoaI9Y4JRaiYxbL32fF0DOxpGy0NYMvKc7GlMQAcx8T1Jk/
7YtPUurdeyvhENOtIH+yHy6sbyquVW68DcxLLyC4DWHs57+pdl6AFw3A6ArKDYd3
dfhrM36da1HECgzCB/kuimkcDlZXVpj8SoBViyvGWDKj4/aiWN/NtFVpdeQvneqX
O17HH9kLxo9CsVnmKzYm0QAeAHziLI+T1wXobwy9oUzpR4rfnHEKtK4Pn5Md/1Ud
Acjjv8iLoPjXH6EWjbwBlNQOGG41t/Z7PdrCTwuN0HCdNsb/EHB8mhveaRFfSXfY
RQD0lmCPDyOCXaRfvKJMsQAYkDXoBBzeFE5asTR63w4XCblx3mSB3x1mpHqvAz2J
HTuB5tiyOo4ANvJ8XDVu8i5eD1OaGG23neKjPfkG19fQ9pTXYiDzow3naZQXIzJO
mVTf+f3MPG7aXDng/vLEJWHVv1f36ILOIM2M4XL25DJ5CPc9S9Lckjy77IVl5S+z
BjzCVDlI4mRS1+d4NhwFbSVvxi3cU/iJWD3c2+XuzTQzlvKjXB2odYsaiDL7Nq9P
RH7FMi1dq1WowyOtLMwR8YJAZmKbVF+wAUO1iJwD1J52qdx59f6wn0H4kT7cG/JS
CaLUxCHP2RCbEar8ywwO/0As4b46JyeEVBDqC+ndDqnK/iZmBoGjthMygOcLLxqV
AyEJ2EuxANjBaO/pq65EJj2ROudBwe5DaWRqlZVOk2o1rrzDqR+8NnzEJpsmV11N
i3hsKgpYOg6vBWrPFKq0A3qJq088oztU4jwLOhNbw86PfcPSxrczuiwW3GtZGEwi
mmeCPInSaQR/MN/pDPFC9LCwhTqR0tevDmky3lUOLeOpMmkWqLnya3F35wtImawp
XHko8Uo56KL8gGnN71VlprvM3XbHWadD8kTSXYscBM0M2oK978zyjShxPr9B7Y7M
dlrYgxZuiPT+IOMeV5rXAJyJsGJdGopzte3wwaMx0W66JTTn6hILCiZ1mLZ/LEv5
MB3dI9qIEHCnJrgdllpecyr6kDDByzsda1jt9eTyI8tO68I1CjDvAfPLgbr5WO5w
F/EBOby67xmqWvlx22zdmTgRSya2xe7BWfqX7uibOOoPnzLmwdtI9BN61WJRHDzo
Aqz5B7wHzdmPDxy87a+9aK0mpjfWOvq3KhPa8TKFP38hovSYCB+sDpSy22nhqnZb
AYkHl4gc8DB9F4OuBv+g8LdYVVSe1768ogOW23q51fALvlGNuNN8Z2NTddgQfQ5u
EudIT4uUhxmn90kYNZaPVzbf4lt3mo5o/VdK+gsTeUEIEReeaBFepW1h5Rdq2GbV
vQ3wPfP5p/6pOfOHfSBfOC5Rdt2Hzdb4RZwIhGV8i3S1xf/QgG34uo0n85RGzfQt
I2GPvXgPTCQ4MNsiJZLpK7s7DpBCoHAWlPuXgK2Fj2TxCBztKwfeC7Xp3edR4FXJ
L8pae/49bZz6zsEKRbYVlG0XMZSFOaWQ92Hfs6rLoDopFcurQkGlmlE3iQoJMfw9
fcl027nwdTJwhmu1EBZ9DOoLH3Us8a/beVUDzq5uHpkWliHnW5/w86PaC91sljxy
8AzRTXYeItL4ZVzlwKolzL35Kk5/b0VCXm4A0B36+/gYhWKZtDi9s7P9Zse9oKgr
pb/EJFjO+23K94DTJZjdthC9SN4lKOnQXvV73pf5Kwe5W0NAqBshkpe8Tl8uR7rm
UtEYp0g8u572LGa1OlGWftPZ/rI17dySoB++gxV7Vx6Xb37Knza7KrrUsiprJYmg
mtRYD7/PSjPGxT/pTKd4k7VcUcdb1/7LfzYAggceSxUsmWnX53zVmPOyMwwUgDJT
3QzbKt+jDzNQ+3JLaMmm5vOX7lDwdm77uNjSIzqyS0NWdMl+yPZ+nsYpI+SWfAlM
S/htrn+moBbwrDyQ/Lss6xk/ABUOTOBVKJZ/1Vwwrn9S+/qbGrzG8F5bKFaUuC6l
5e58IAAKI5HJNReWsYqIE0xQxRLcBVnJJ1wiGggYVc3GETMVtaRp7GFhK8zBr7YU
xpYIU+CLzsB/KmZtQR7cnZiAxVUIH6egXR5OtthWkaAdsyMN5J12d/68x6UTHnzC
GdEgThVQBJdp1bRHP/siqwaubQGCCpCpZCghqs1OdPcbHgRQqJcj+YN7t3hi6ull
FvZiEdmNKzcPro6ID3tZIigbrUHKXDRgByZfEuqXwhRgTQhKMZHaCHwx44BCSSQg
0Ozs9APUdQno0DvwBFRwGRoDd+5bPLfriYsl+LPpYXUQSrHigzWS2A52u3/dxLA+
vVP/BhLO1zUqCkJJoTaM0VitTZ5IptgZC4hjuYe0yoKvKFTDCLGo+xdSizHXvKhl
X8NiwLtMASzKWMxk8Z+84oPh8ndEsz+sMYhL1Sr+p0Qy3IhPsLPL3riFPmzSIncI
bc62omUyNX3br9lROT4w+6V02eOr4KK6DbczFYDeowOW0fULqWXDPQylU3IL53ht
wDHz984FKixPlcM55PH94bz/jkk3PnY9zINag+tWrcFLm39c7o2i6DqNOYMGPp6b
NxKvwwS8t0SFQcHg0T0Doih8OBsxWcMjyewvx9lV0PPJuBs7+9Kf38h35uvtc5LR
BBUA9rJAMQIIFLg0Amz1kVk9J4nNyiDIcZ9J1zlnLwnpbPRptCj+xCZdF/l0lCoc
zjssDgKxxIlKbkArg5KbhceyT4dgpyXg/JD6rLtcvJt+sEINW/+TOIYZtxeP7zUw
3AbNnIKZLSE/HWjNKVxEDKu3TWiS3h2dMdEF5ihk+zhZd27LloogzuGe6Etl1jkO
i0ml
=IWn5
-----END PGP MESSAGE-----

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [RFC] Adding a real HashTable implementation to gnulib
  2018-12-04  2:32     ` Bruno Haible
  2018-12-04 19:36       ` Tim Rühsen
@ 2018-12-04 19:40       ` Tim Rühsen
  2018-12-04 23:43         ` Bruno Haible
  2019-01-03 16:32       ` Tim Rühsen
  2 siblings, 1 reply; 15+ messages in thread
From: Tim Rühsen @ 2018-12-04 19:40 UTC (permalink / raw)
  To: bug-gnulib

[-- Attachment #1.1: Type: text/plain, Size: 4301 bytes --]

Hi Bruno,

thank you so much for taking a look :-)

(sorry, unencrypted to the ML now)

On 04.12.18 03:32, Bruno Haible wrote:
> Hi Tim,
> 
>> We have 'hashmap' [1] in GNU Wget2.
> 
> Things I like about the implementation:
> 
>   - Collision resolution through linked list (a robust algorithm).
> 
>   - Reasonably generic API.
> 
>   - The user-definable destructor functions.
> 
> Things that could be improved:
> 
>   - The ability to set a growth policy > 0. This leads to quadratic
>     run time behaviour. I mean, you make such an effort to have O(1)
>     access on average, and then the fixed-size increment of the size
>     turns the insertion operation into an average-O(n) operation.

In Wget2 we use reasonable default sizes for the hashmap to make
resizing more or less a corner case.

Currently we have
- positive integer (N): resize by old value * N
- negative integer (N): resize by old value + (-N)

How can we do it better ?
- using a positive floating point value N ?
- for flexibility, allow to set a callback function to "calculate" the
new size ?

Or anything else ?

>   - Some functions take keysize and valuesize arguments, some don't.
>     I'm confused.

The "noalloc" functions just store the given pointers for key and value,
that means without allocation. The hashmap takes ownership of the
objects and has to release them when being removed/destroyed (or use a
NULL destructor when the values are static). This can be used to
generate a search index for an existing bunch of values (objects /
structures). Like a "unique index" for a field in a database table.
Example function: wget_hashmap_put_noalloc()

Then we have the functions that do (shallow) allocations/clones, like
wget_hashmap_put(). This is for using the hashmap as a container, the
original objects have to be (shallow) released outside of the container.

In Wget2 we use both types of functions.

Of course we could also have a function callback for cloning deeply,
maybe one for the keys and one for the values. Does that make sense ?

>   - NULL values are special. The documentation says "If there are NULL
>     values in the stringmap, you should use wget_stringmap_get_null() to
>     distinguish between 'not found' and 'found NULL value'." I would
>     prefer an API that does not require me to think about whether I have
>     null values in the map or not.

Agreed, it had some historic reasons that I currently don't remember.
I'll review that functions.

>   - To iterate through the table, you need to define an instance of the
>     wget_hashmap_browse_t function. I love functional programming, but
>     C is not the right language for that. If the inner part (the body
>     of the 'browse' function) needs to access outer variables, the
>     programmer needs to manually create a "context" struct and fill it -
>     things that the compiler would be doing in other programming languages.
>     Some people say that the solution to this problem are the nested
>     functions supported by GCC, but
>       1. This is not portable; only GCC supports this.
>       2. On many CPUs (including x86_64), the use of nested functions
>          requires to disable on the entire process a security feature
>          (namely, stacks without execute permission).
>     I therefore prefer the concept of "iterator" objects that allow
>     the programmer to step through the hash table without contorting
>     their code and without compromising on security.

Nested functions are really nice, I used to use them a lot. But the
concept of needing executable stack memory is truly crap. As you say,
the callback functions needs a context variable. But you don't have to
use it, e.g. when you just need to set a variable for all objects in the
hashmap. In that case a NULL value is allowed.

Of if you want to print out all objects into a file, your 'context'
could just be a FILE * or a file descriptor.

We have several use cases in Wget2 for wget_hashmap_browse() which makes
code more readable than using an explicit iterator.

Still I agree that for a general purpose hashmap, there must be an
iterator type/object + functions. I will add such and report back.

Regards, Tim

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] Adding a real HashTable implementation to gnulib
  2018-12-04 19:40       ` Tim Rühsen
@ 2018-12-04 23:43         ` Bruno Haible
  2019-08-26 14:24           ` Tim Rühsen
  0 siblings, 1 reply; 15+ messages in thread
From: Bruno Haible @ 2018-12-04 23:43 UTC (permalink / raw)
  To: bug-gnulib; +Cc: Tim Rühsen

Hi Tim,

> Currently we have
> - positive integer (N): resize by old value * N
> - negative integer (N): resize by old value + (-N)

It's currently the opposite:
- positive integer (N): resize to old size + N
- negative integer (N): resize to old size * (-N)

> >   - The ability to set a growth policy > 0. This leads to quadratic
> >     run time behaviour. I mean, you make such an effort to have O(1)
> >     access on average, and then the fixed-size increment of the size
> >     turns the insertion operation into an average-O(n) operation.
> 
> In Wget2 we use reasonable default sizes for the hashmap to make
> resizing more or less a corner case.

The problem is that when you guessed wrong about the "corner case",
the rehash will consume 99.9% of your program's run time.

> How can we do it better ?

1. Remove the ability to "resize to old size + N",
2. (optional) When "resize to old size * FACTOR", allow the factor
   to be 1.5 or 1.3 or something like that.

> >   - Some functions take keysize and valuesize arguments, some don't.
> >     I'm confused.
> 
> The "noalloc" functions just store the given pointers for key and value,
> that means without allocation. The hashmap takes ownership of the
> objects and has to release them when being removed/destroyed (or use a
> NULL destructor when the values are static). This can be used to
> generate a search index for an existing bunch of values (objects /
> structures). Like a "unique index" for a field in a database table.
> Example function: wget_hashmap_put_noalloc()
> 
> Then we have the functions that do (shallow) allocations/clones, like
> wget_hashmap_put(). This is for using the hashmap as a container, the
> original objects have to be (shallow) released outside of the container.

Hmm. What you describe here is whether ownership is passed to the container
or kept outside the container. My question was about the keysize and
valuesize arguments, that is, about the nature of the key and value objects
(a. a pointer, pointing to anything in memory, or
 b. a region of memory, with a start address and an end address).

I think both questions are orthogonal:
  - In either a or b, you can keep the ownership outside the container
    by using a NULL destructor function.
  - For container-owned objects:
    In case a, you need a clone or copy function indeed,
    In case b, the container needs to clone precisely the given region of
    memory.

But I haven't thought long about it. What do you think?

Bruno



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] Adding a real HashTable implementation to gnulib
  2018-12-04  2:32     ` Bruno Haible
  2018-12-04 19:36       ` Tim Rühsen
  2018-12-04 19:40       ` Tim Rühsen
@ 2019-01-03 16:32       ` Tim Rühsen
  2019-01-03 17:10         ` nested functions Bruno Haible
  2019-01-04 11:59         ` [RFC] Adding a real HashTable implementation to gnulib Florian Weimer
  2 siblings, 2 replies; 15+ messages in thread
From: Tim Rühsen @ 2019-01-03 16:32 UTC (permalink / raw)
  To: Bruno Haible; +Cc: bug-gnulib, Darshit Shah


[-- Attachment #1.1: Type: text/plain, Size: 2817 bytes --]

On 12/4/18 3:32 AM, Bruno Haible wrote:
> Hi Tim,
> 
>> We have 'hashmap' [1] in GNU Wget2.
> 
> Things I like about the implementation:
> 
>   - Collision resolution through linked list (a robust algorithm).
> 
>   - Reasonably generic API.
> 
>   - The user-definable destructor functions.
> 
> Things that could be improved:
> 
>   - The ability to set a growth policy > 0. This leads to quadratic
>     run time behaviour. I mean, you make such an effort to have O(1)
>     access on average, and then the fixed-size increment of the size
>     turns the insertion operation into an average-O(n) operation.

Changed now, only supporting a float number as factor now.


>   - Some functions take keysize and valuesize arguments, some don't.
>     I'm confused.

That has a reason, I'll answer with examples in a second email. Not
enough time right now.


>   - NULL values are special. The documentation says "If there are NULL
>     values in the stringmap, you should use wget_stringmap_get_null() to
>     distinguish between 'not found' and 'found NULL value'." I would
>     prefer an API that does not require me to think about whether I have
>     null values in the map or not.

API has been changed to handle NULL values without thinking (put and get).


>   - To iterate through the table, you need to define an instance of the
>     wget_hashmap_browse_t function. I love functional programming, but
>     C is not the right language for that. If the inner part (the body
>     of the 'browse' function) needs to access outer variables, the
>     programmer needs to manually create a "context" struct and fill it -
>     things that the compiler would be doing in other programming languages.
>     Some people say that the solution to this problem are the nested
>     functions supported by GCC, but
>       1. This is not portable; only GCC supports this.
>       2. On many CPUs (including x86_64), the use of nested functions
>          requires to disable on the entire process a security feature
>          (namely, stacks without execute permission).
>     I therefore prefer the concept of "iterator" objects that allow
>     the programmer to step through the hash table without contorting
>     their code and without compromising on security.

Iterators are now implemented with three functions (...alloc, ...free,
...next).

BTW, nested functions are pretty nice to use.

But I still wonder why

 - nested functions need an executable stack (why does the code have to
be run on the stack ?)

 - there are no efforts to standardize either nested functions or blocks
(clang has 'blocks' instead of nested functions) or both. (There is a
C22 panel, but I couldn't find anything useful on their task list.)

Regards, Tim


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: nested functions
  2019-01-03 16:32       ` Tim Rühsen
@ 2019-01-03 17:10         ` Bruno Haible
  2019-01-04 11:59         ` [RFC] Adding a real HashTable implementation to gnulib Florian Weimer
  1 sibling, 0 replies; 15+ messages in thread
From: Bruno Haible @ 2019-01-03 17:10 UTC (permalink / raw)
  To: Tim Rühsen; +Cc: bug-gnulib, Darshit Shah

[Not sure this is the right forum. Maybe stackoverflow.com would be a better
 place to discuss this.]

Tim Rühsen wrote:
> BTW, nested functions are pretty nice to use.
> 
> But I still wonder why
> 
>  - nested functions need an executable stack (why does the code have to
> be run on the stack ?)

Because in most ABIs, a function pointer is a pointer to the first instruction.
The notable exceptions are:
  - PowerPC and PowerPC64 with AIX ABI,
  - ia64
  - hppa, hppa64.
See https://git.savannah.gnu.org/gitweb/?p=libffcall.git;a=blob;f=porting-tools/abis/function-pointer.txt

>  - there are no efforts to standardize either nested functions or blocks
> (clang has 'blocks' instead of nested functions) or both.

As far as I understand, this [1] would be a profound change of the C language.
Possibly people feel that C shouldn't evolve in an uncontrolled and crazy way
like C++ does...

Bruno

[1] https://en.wikipedia.org/wiki/Blocks_(C_language_extension)



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] Adding a real HashTable implementation to gnulib
  2019-01-03 16:32       ` Tim Rühsen
  2019-01-03 17:10         ` nested functions Bruno Haible
@ 2019-01-04 11:59         ` Florian Weimer
  1 sibling, 0 replies; 15+ messages in thread
From: Florian Weimer @ 2019-01-04 11:59 UTC (permalink / raw)
  To: Tim Rühsen; +Cc: bug-gnulib, Bruno Haible, Darshit Shah

* Tim Rühsen:

> But I still wonder why
>
>  - nested functions need an executable stack (why does the code have to
> be run on the stack ?)

The static chain pointer needs to be passed to the nest function, but
most ABIs only have a code pointer.  So a trampoline is written to the
stack which sets up the static chain (in a hidden function argument) and
calls the implementation.  The address of the trampoline is used as the
code pointer for the nested function.

>  - there are no efforts to standardize either nested functions or blocks
> (clang has 'blocks' instead of nested functions) or both. (There is a
> C22 panel, but I couldn't find anything useful on their task list.)

You can simply use C++.  C++11 and later have std::function and lambdas.

The standardization of lambdas for C has been abandoned, as far as I
know.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] Adding a real HashTable implementation to gnulib
  2018-12-04 23:43         ` Bruno Haible
@ 2019-08-26 14:24           ` Tim Rühsen
  0 siblings, 0 replies; 15+ messages in thread
From: Tim Rühsen @ 2019-08-26 14:24 UTC (permalink / raw)
  To: Bruno Haible, bug-gnulib


[-- Attachment #1.1: Type: text/plain, Size: 3395 bytes --]

Hi Bruno,

sorry for the delay. This somewhat dropped off the tip of my list, but
lurks at the edge of my consciousness, coming up from time to time.

On 12/5/18 12:43 AM, Bruno Haible wrote:
> Hi Tim,
> 
>> Currently we have
>> - positive integer (N): resize by old value * N
>> - negative integer (N): resize by old value + (-N)
> 
> It's currently the opposite:
> - positive integer (N): resize to old size + N
> - negative integer (N): resize to old size * (-N)
> 
>>>   - The ability to set a growth policy > 0. This leads to quadratic
>>>     run time behaviour. I mean, you make such an effort to have O(1)
>>>     access on average, and then the fixed-size increment of the size
>>>     turns the insertion operation into an average-O(n) operation.
>>
>> In Wget2 we use reasonable default sizes for the hashmap to make
>> resizing more or less a corner case.
> 
> The problem is that when you guessed wrong about the "corner case",
> the rehash will consume 99.9% of your program's run time.
> 
>> How can we do it better ?
> 
> 1. Remove the ability to "resize to old size + N",
> 2. (optional) When "resize to old size * FACTOR", allow the factor
>    to be 1.5 or 1.3 or something like that.

That's what I did meanwhile, FACTOR is now a float and that's it.


>>>   - Some functions take keysize and valuesize arguments, some don't.
>>>     I'm confused.
>>
>> The "noalloc" functions just store the given pointers for key and value,
>> that means without allocation. The hashmap takes ownership of the
>> objects and has to release them when being removed/destroyed (or use a
>> NULL destructor when the values are static). This can be used to
>> generate a search index for an existing bunch of values (objects /
>> structures). Like a "unique index" for a field in a database table.
>> Example function: wget_hashmap_put_noalloc()
>>
>> Then we have the functions that do (shallow) allocations/clones, like
>> wget_hashmap_put(). This is for using the hashmap as a container, the
>> original objects have to be (shallow) released outside of the container.
> 
> Hmm. What you describe here is whether ownership is passed to the container
> or kept outside the container. My question was about the keysize and
> valuesize arguments, that is, about the nature of the key and value objects
> (a. a pointer, pointing to anything in memory, or
>  b. a region of memory, with a start address and an end address).
> 
> I think both questions are orthogonal:
>   - In either a or b, you can keep the ownership outside the container
>     by using a NULL destructor function.
>   - For container-owned objects:
>     In case a, you need a clone or copy function indeed,
>     In case b, the container needs to clone precisely the given region of
>     memory.
> 
> But I haven't thought long about it. What do you think?

I thought a while about this and finally decided to leave memory
allocation outside the container, in the responsibility of the caller.

That reduces the number of functions, makes things easier to understand,
reduce overall code. That moves the API usage more into the direction of
a no-brainer. The additional code complexity in the application (wget2)
was much less than I feared.

Thank you for your hints / thoughts and please excuse me for taking so
long to answer.

Regards, Tim


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2019-08-26 14:47 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-26 23:16 [RFC] Adding a real HashTable implementation to gnulib Darshit Shah
2018-11-26 23:22 ` Ben Pfaff
2018-11-27  1:02   ` Darshit Shah
2018-11-27  4:47     ` Ben Pfaff
2018-12-02 13:41 ` Bruno Haible
2018-12-02 15:05   ` LRN
2018-12-03  9:13   ` Tim Rühsen
2018-12-04  2:32     ` Bruno Haible
2018-12-04 19:36       ` Tim Rühsen
2018-12-04 19:40       ` Tim Rühsen
2018-12-04 23:43         ` Bruno Haible
2019-08-26 14:24           ` Tim Rühsen
2019-01-03 16:32       ` Tim Rühsen
2019-01-03 17:10         ` nested functions Bruno Haible
2019-01-04 11:59         ` [RFC] Adding a real HashTable implementation to gnulib Florian Weimer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).