unofficial mirror of libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Florian Weimer via Libc-alpha <libc-alpha@sourceware.org>
To: Adhemerval Zanella via Libc-alpha <libc-alpha@sourceware.org>
Subject: Re: [PATCH 24/28] elf: Implement a string table for ldconfig, with tail merging
Date: Fri, 30 Oct 2020 18:08:02 +0100	[thread overview]
Message-ID: <87v9erk5fx.fsf@oldenburg2.str.redhat.com> (raw)
In-Reply-To: <2fa948f4-4cc9-750d-3282-7df1d4dd4498@linaro.org> (Adhemerval Zanella via Libc-alpha's message of "Tue, 20 Oct 2020 11:25:59 -0300")

* Adhemerval Zanella via Libc-alpha:

> On 01/10/2020 13:34, Florian Weimer via Libc-alpha wrote:
>> This will be used in ldconfig to reduce the ld.so.cache size slightly.
>
> Could you extend the commit message to explain why the 32-bit FNV-1a hash
> is used and how the hash table is organized (how collisions are handled,
> expected memory usage, strategy used to increase/decrease the bucket size)?
>
> It could help also to explain the usercase a bit more, the 'tail merging'
> is not straightforward to understand without dig deep in the code.  Also
> why kind of cache size decrease do you expect by using strategy?

That depends whether cached DSOs use sonames as file names (so that no
symbolic links are needed).  Typically that's not the case today, so the
savings are really small.  glibc-hwcaps may change that.

The repost will have an expanded commit message:

elf: Implement a string table for ldconfig, with tail merging

This will be used in ldconfig to reduce the ld.so.cache size slightly.

Tail merging is an optimization where a pointer points into another
string if the first string is a suffix of the second string.

The hash function FNV-1a was chosen because it is simple and achieves
good dispersion even for short strings (so that the hash table bucket
count can be a power of two).  It is clearly superior to the hsearch
hash and the ELF hash in this regard.

The hash table uses chaining for collision resolution.

>> diff --git a/elf/stringtable.c b/elf/stringtable.c
>> new file mode 100644
>> index 0000000000..f9ade50249
>> --- /dev/null
>> +++ b/elf/stringtable.c
>> @@ -0,0 +1,201 @@
>> +/* String tables for ld.so.cache construction.  Implementation.
>
> This file misses the Copyright year.

Fixed throughout.

>> +static void
>> +stringtable_init (struct stringtable *table)
>> +{
>> +  table->count = 0;
>> +  table->allocated = 16;
>> +  table->entries = xcalloc (table->allocated, sizeof (table->entries[0]));
>> +}
>> +
>
> Why 16 elements as initial size?

I'm increasing it to 128 with a comment.  128 is based on the number of
DSOs within glibc itself.

>> +struct stringtable_entry *
>> +stringtable_intern (struct stringtable *table, const char *string)
>> +{
>> +  if (table->allocated == 0)
>> +    stringtable_init (table);
>
> How this could happen? Is it expect the caller to set 'allocated'
> explicitly?

Zero-initialization is valid.  stringtable_free also leaves the table
ready for re-use.

>> +  /* Copy the strings.  */
>> +  for (uint32_t i = 0; i < table->allocated; ++i)
>> +    for (struct stringtable_entry *e = table->entries[i]; e != NULL;
>> +         e = e->next)
>> +      if (result->strings[e->offset] == '\0')
>> +        memcpy (&result->strings[e->offset], e->string, e->length + 1);
>> +}
>
> Ok, I guess allocating a new stringtable_finalized should be simpler than
> operating the table itself.

Sorry, I don't understand.  Do you mean reusing the table allocation in
some way?  Yes, that would be fairly complicated.

>> +/* Adds STRING to TABLE.  May return the address of an existing entry.  */
>> +struct stringtable_entry *stringtable_intern (struct stringtable *table,
>> +                                              const char *string);
>
> I think this name is confusing, why not just 'stringtable_add' or
> 'stringtable_add_element'?

I'm changing it to stringtable_add.  Didn't realize that interning is
obscure terminology.

Thanls,
Florian
-- 
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill


  reply	other threads:[~2020-10-30 17:08 UTC|newest]

Thread overview: 103+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-01 16:31 [PATCH 00/28] glibc-hwcaps support Florian Weimer via Libc-alpha
2020-10-01 16:31 ` [PATCH 01/28] elf: Do not search HWCAP subdirectories in statically linked binaries Florian Weimer via Libc-alpha
2020-10-01 18:22   ` Adhemerval Zanella via Libc-alpha
2020-10-01 18:24     ` Carlos O'Donell via Libc-alpha
2020-10-01 18:29       ` Adhemerval Zanella via Libc-alpha
2020-10-01 20:24         ` Carlos O'Donell via Libc-alpha
2020-10-01 16:31 ` [PATCH 02/28] elf: Implement __rtld_malloc_is_full Florian Weimer via Libc-alpha
2020-10-01 18:23   ` Adhemerval Zanella via Libc-alpha
2020-10-08  9:44     ` Florian Weimer via Libc-alpha
2020-10-01 16:31 ` [PATCH 03/28] elf: Implement _dl_write Florian Weimer via Libc-alpha
2020-10-05 19:46   ` Adhemerval Zanella via Libc-alpha
2020-10-01 16:31 ` [PATCH 04/28] elf: Extract command-line/environment variables state from rtld.c Florian Weimer via Libc-alpha
2020-10-06 20:45   ` Adhemerval Zanella via Libc-alpha
2020-10-08 11:32     ` Florian Weimer via Libc-alpha
2020-10-01 16:32 ` [PATCH 05/28] elf: Move ld.so error/help output to _dl_usage Florian Weimer via Libc-alpha
2020-10-06 21:06   ` Adhemerval Zanella via Libc-alpha
2020-10-08 12:19     ` Florian Weimer via Libc-alpha
2020-10-01 16:32 ` [PATCH 06/28] elf: Record whether paths come from LD_LIBRARY_PATH or --library-path Florian Weimer via Libc-alpha
2020-10-07 16:39   ` Adhemerval Zanella via Libc-alpha
2020-10-07 16:49     ` Florian Weimer
2020-10-01 16:32 ` [PATCH 07/28] elf: Implement ld.so --help Florian Weimer via Libc-alpha
2020-10-07 17:16   ` Adhemerval Zanella via Libc-alpha
2020-10-08 13:13     ` Florian Weimer via Libc-alpha
2020-10-01 16:32 ` [PATCH 08/28] elf: Implement ld.so --version Florian Weimer via Libc-alpha
2020-10-07 18:36   ` Adhemerval Zanella via Libc-alpha
2020-10-07 18:38     ` Adhemerval Zanella via Libc-alpha
2020-10-08 13:37     ` Florian Weimer via Libc-alpha
2020-10-01 16:32 ` [PATCH 09/28] scripts/update-copyrights: Update csu/version.c, elf/dl-usage.c Florian Weimer via Libc-alpha
2020-10-07 18:41   ` Adhemerval Zanella via Libc-alpha
2020-10-01 16:32 ` [PATCH 10/28] elf: Use the term "program interpreter" in the ld.so help message Florian Weimer via Libc-alpha
2020-10-07 21:08   ` Adhemerval Zanella via Libc-alpha
2020-10-08 14:08     ` Florian Weimer via Libc-alpha
2020-10-01 16:32 ` [PATCH 11/28] elf: Print the full name of the dynamic loader " Florian Weimer via Libc-alpha
2020-10-08 12:38   ` Adhemerval Zanella via Libc-alpha
2020-10-01 16:32 ` [PATCH 12/28] elf: Make __rtld_env_path_list and __rtld_search_dirs global variables Florian Weimer via Libc-alpha
2020-10-08 13:27   ` Adhemerval Zanella via Libc-alpha
2020-10-01 16:32 ` [PATCH 13/28] elf: Add library search path information to ld.so --help Florian Weimer via Libc-alpha
2020-10-08 16:22   ` Adhemerval Zanella via Libc-alpha
2020-10-01 16:33 ` [PATCH 14/28] elf: Enhance ld.so --help to print HWCAP subdirectories Florian Weimer via Libc-alpha
2020-10-08 16:27   ` Adhemerval Zanella via Libc-alpha
2020-10-09  8:18     ` Florian Weimer via Libc-alpha
2020-10-09 13:49   ` Matheus Castanho via Libc-alpha
2020-10-09 17:08     ` Florian Weimer via Libc-alpha
2020-10-09 17:12       ` Florian Weimer via Libc-alpha
2020-10-09 18:54         ` Matheus Castanho via Libc-alpha
2020-10-12  9:47           ` Florian Weimer via Libc-alpha
2020-10-01 16:33 ` [PATCH 15/28] elf: Do not pass GLRO(dl_platform), GLRO(dl_platformlen) to _dl_important_hwcaps Florian Weimer via Libc-alpha
2020-10-08 18:04   ` Adhemerval Zanella via Libc-alpha
2020-10-01 16:33 ` [PATCH 16/28] elf: Add glibc-hwcaps support for LD_LIBRARY_PATH Florian Weimer via Libc-alpha
2020-10-08 10:13   ` Szabolcs Nagy via Libc-alpha
2020-10-09  9:08     ` Florian Weimer via Libc-alpha
2020-10-09 10:50       ` Szabolcs Nagy via Libc-alpha
2020-10-09 10:55         ` Florian Weimer via Libc-alpha
2020-10-09 11:03           ` Szabolcs Nagy via Libc-alpha
2020-10-08 23:16   ` Paul A. Clarke via Libc-alpha
2020-10-09  8:56     ` Florian Weimer via Libc-alpha
2020-10-09 13:19   ` Adhemerval Zanella via Libc-alpha
2020-10-12 11:54     ` Florian Weimer via Libc-alpha
2020-10-01 16:33 ` [PATCH 17/28] x86_64: Add glibc-hwcaps support Florian Weimer via Libc-alpha
2020-10-01 16:33 ` [PATCH 18/28] powerpc64le: " Florian Weimer via Libc-alpha
2020-10-01 18:56   ` Paul A. Clarke via Libc-alpha
2020-10-05  9:47     ` Florian Weimer via Libc-alpha
2020-10-05 19:15       ` Paul A. Clarke via Libc-alpha
2020-10-06 12:20         ` Florian Weimer via Libc-alpha
2020-10-06 17:45           ` Paul A. Clarke via Libc-alpha
2020-10-09  9:06             ` Florian Weimer via Libc-alpha
2020-10-01 16:33 ` [PATCH 19/28] s390x: Add " Florian Weimer via Libc-alpha
2020-10-01 16:33 ` [PATCH 20/28] aarch64: " Florian Weimer via Libc-alpha
2020-10-14 13:46   ` Adhemerval Zanella via Libc-alpha
2020-10-14 14:08     ` Florian Weimer via Libc-alpha
2020-10-14 14:15       ` Adhemerval Zanella via Libc-alpha
2020-10-14 14:37         ` Szabolcs Nagy via Libc-alpha
2020-10-14 14:43           ` Adhemerval Zanella via Libc-alpha
2020-10-14 15:13             ` Florian Weimer via Libc-alpha
2020-10-14 14:44           ` Florian Weimer via Libc-alpha
2020-10-14 15:09             ` Szabolcs Nagy via Libc-alpha
2020-10-01 16:33 ` [PATCH 21/28] elf: Add endianness markup to ld.so.cache Florian Weimer via Libc-alpha
2020-10-14 14:07   ` Adhemerval Zanella via Libc-alpha
2020-10-01 16:33 ` [PATCH 22/28] elf: Add extension mechanism " Florian Weimer via Libc-alpha
2020-10-15 17:52   ` Adhemerval Zanella via Libc-alpha
2020-10-30 12:22     ` Florian Weimer via Libc-alpha
2020-11-03 12:45       ` Adhemerval Zanella via Libc-alpha
2020-11-03 15:30         ` Florian Weimer via Libc-alpha
2020-10-01 16:34 ` [PATCH 23/28] elf: Unify old and new format cache handling code in ld.so Florian Weimer via Libc-alpha
2020-10-16 14:37   ` Adhemerval Zanella via Libc-alpha
2020-10-30 13:22     ` Florian Weimer via Libc-alpha
2020-11-03 13:02       ` Adhemerval Zanella via Libc-alpha
2020-10-01 16:34 ` [PATCH 24/28] elf: Implement a string table for ldconfig, with tail merging Florian Weimer via Libc-alpha
2020-10-20 14:25   ` Adhemerval Zanella via Libc-alpha
2020-10-30 17:08     ` Florian Weimer via Libc-alpha [this message]
2020-11-03 13:05       ` Adhemerval Zanella via Libc-alpha
2020-11-03 15:29         ` Florian Weimer via Libc-alpha
2020-10-01 16:34 ` [PATCH 25/28] elf: Implement tail merging of strings in ldconfig Florian Weimer via Libc-alpha
2020-10-22 21:08   ` Adhemerval Zanella via Libc-alpha
2020-10-30 17:36     ` Florian Weimer via Libc-alpha
2020-10-01 16:34 ` [PATCH 26/28] elf: In ldconfig, extract the new_sub_entry function from search_dir Florian Weimer via Libc-alpha
2020-10-27 13:15   ` Adhemerval Zanella via Libc-alpha
2020-10-01 16:34 ` [PATCH 27/28] elf: Process glibc-hwcaps subdirectories in ldconfig Florian Weimer via Libc-alpha
2020-10-27 17:28   ` Adhemerval Zanella via Libc-alpha
2020-11-04 11:57     ` Florian Weimer via Libc-alpha
2020-10-01 16:34 ` [PATCH 28/28] elf: Add glibc-hwcaps subdirectory support to ld.so cache processing Florian Weimer via Libc-alpha
2020-10-01 16:50 ` [PATCH 00/28] glibc-hwcaps support H.J. Lu via Libc-alpha
2020-10-01 16:54   ` Florian Weimer via Libc-alpha

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/libc/involved.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87v9erk5fx.fsf@oldenburg2.str.redhat.com \
    --to=libc-alpha@sourceware.org \
    --cc=fweimer@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).