unofficial mirror of libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: enh <enh@google.com>
To: Fangrui Song <maskray@gcc.gnu.org>
Cc: Wilco Dijkstra <Wilco.Dijkstra@arm.com>,
	GNU C Library <libc-alpha@sourceware.org>,
	 Florian Weimer <fweimer@redhat.com>
Subject: Re: CREL dynamic relocations
Date: Fri, 12 Apr 2024 09:18:56 -0700	[thread overview]
Message-ID: <CAJgzZoox6GqTUwdjD0mqXs=A=O8YSm_qC_hzhtf10_egmAmrDw@mail.gmail.com> (raw)
In-Reply-To: <CAN30aBGaKwqCYSEhpjL6LX0+T65dzgpKR4pkXTudWEs1qWQ68g@mail.gmail.com>

On Wed, Apr 10, 2024 at 7:41 PM Fangrui Song <maskray@gcc.gnu.org> wrote:
>
> Thank you for your interest in the CREL relocation format.
>
> On Tue, Apr 9, 2024 at 8:33 AM Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
> > I like the general idea of more compact relocations, however what I don't get is
> > what the overall goal is. If the goal is more compact object files, why don't we just
> > add a (de)compress pass using a fast compression algorithm? CPU time is cheap
> > today, and real compression easily gives 2-4x reduction of object file size, far more
> > than you could achieve by just compressing relocations.
>
> My primary goal is to make relocatable files smaller (see
> https://sourceware.org/pipermail/binutils/2024-March/133229.html for a
> few use cases).
> Smaller files benefit applications in several ways, including smaller
> I/O amount and lower linker memory usage (for linkers like gold, lld,
> and mold that map input files into memory).
>
> Generic data compression formats (like zlib or zstd) applied at the
> filesystem level won't achieve this goal because they don't decrease
> memory usage.
> In addition, filesystem compression does not appear too popular.
>
> Interestingly, I measured a 5.9% size reduction in .o files even after
> zstd compression when comparing two Clang builds with and without
> CREL.
>
>     % ruby -e 'require "zstd-ruby"; un=com=0;
> Dir.glob("/tmp/out/s2-custom0/**/*.o").each{|f| x =
> File.open(f,"rb"){|h|h.read}; un+=x.size; com+=Zstd.compress(x).size};
> puts "uncompressed: #{un}\ncompressed: #{com}"'
>     uncompressed: 136086784
>     compressed: 37173381
>
>     % ruby -e 'require "zstd-ruby"; un=com=0;
> Dir.glob("/tmp/out/s2-custom1/**/*.o").each{|f| x =
> File.open(f,"rb"){|h|h.read}; un+=x.size; com+=Zstd.compress(x).size};
> puts "uncompressed: #{un}\ncompressed: #{com}"'
>     uncompressed: 111655952
>     compressed: 34964421
>
>     1-111655952/136086784 ~= 18.0% (uncompressed)
>     1-34964421/37173381 ~= 5.9%    (zstd)
>
> Another objective is to minimize the size of dynamic relocations.
> Android achieves this through ld.lld --pack-dyn-relocs=android+relr,
> which compacts RELA relocations in their packed format.
> While effective, CREL offers a simpler approach that delivers even
> greater size reductions.

yeah, though android+relr has the advantage of having already shipped,
so it's usable on just about every device that app developers still
support (api 23 --
https://android.googlesource.com/platform/bionic/+/master/android-changes-for-ndk-developers.md#relative-relocations-relr
-- versus the effective minimum of api 21) :-)

something new today wouldn't be as widely usable on Android for about
a decade, so although we'd probably support it if others did, to be
really interesting for Android -- the point where we'd implement it
even if no-one else did -- it'd have to be 2x better (in space or
time, and not sacrificing the other, because app developers are very
conscious of both) rather than the little bit better that it actually
is. given our long lead time [for app developers to be able to rely on
ubiquity] and our "good enough" solution, i'm actually more interested
in the "what if we re-thought _everything_?" end of the spectrum than
small tweaks here.

what does mach-o do here? (that is: "why doesn't Apple care?".)

> > Alternatively, if we wanted to access and process ELF files without any decompression,
> > we could define compact relocations as fixed-size entries. Using 64 bits for a compact
> > RELA relocation gives a straightforward 4x compression. Out of range values could
> > use the next entry to extend the ranges.
>
> 64 bits are quite large. CREL typically needs just one to three bytes
> for one relocation.
> How do you design a format that is generic enough to work with all
> relocation types and symbol indexes?
>
> > So my main issue with the proposal is that it tries too hard to compress relocations.
> > For example using offset compression for relocations, symbol indices and even addends
> > seems to have little value: the signed offset means you lose one bit, and if out of range
> > values are rare or not grouped together, offset encodings are actually less efficient.
>
> I actually use unsigned delta offset to save one bit but signed delta
> symidx/addend.
> I have analyzed how many bits are demanded by typical relocations.
> Quote https://maskray.me/blog/2024-03-09-a-compact-relocation-format-for-elf#crel-relocation-format
> :
>
>     Absolute symbol indexes allow one-byte encoding for symbols in the
> range [0,128) and offer minor size advantage for static relocations
> when the symbol table is sorted by usage frequency. Delta encoding, on
> the other hand, might optimize for the scenario when the symbol table
> presents locality: neighbor symbols are frequently mutually called.
>
>     Delta symbol index enables one-byte encoding for GOT/PLT dynamic
> relocations when .got/.got.plt entries are ordered by symbol index.
> For example, R_*_GLOB_DAT and R_*_JUMP_SLOT relocations can typically
> be encoded with repeated 0x05 0x01 (when addend_bit==0 && shift==3,
> offset++, symidx++). Delta encoding has a disvantage. It can partial
> claim the optimization by arranging symbols in a "cold0 hot cold1"
> pattern. In addition, delta symbol index enables one-byte encoding for
> GOT/PLT dynamic relocations when .got/.got.plt entries are ordered by
> symbol index.
>
>     In my experiments, absolute encoding with ULEB128 results in
> slightly larger .o file sizes for both x86-64 and AArch64 builds.
>
> For a decoder that only supports in-reloc addends (recommended for
> relocatable files), the C++ implementation is as simple as:
>
>   const auto hdr = decodeULEB128(p);
>   const size_t count = hdr / 8, shift = hdr % 4;
>   Elf_Addr offset = 0, addend = 0;
>   uint32_t symidx = 0, type = 0;
>   for (size_t i = 0; i != count; ++i) {
>     const uint8_t b = *p++;
>     offset += b >> 3;
>     if (b >= 0x80) offset += (decodeULEB128(p) << 4) - 0x10;
>     if (b & 1) symidx += decodeSLEB128(p);
>     if (b & 2) type += decodeSLEB128(p);
>     if (b & 4) addend += decodeSLEB128(p);
>     rels[i] = {offset << shift, symidx, type, addend};
>   }
>
> += for all of symidx/type/addend is for consistency, but the choice
> turns out to be very good as well.
>
> > I don't get the discussion about relocation numbers on AArch64 - 4 or 5 bits would
> > handle all frequently used relocations, so we'd just remap them to fit in the short
> > encoding. Hence I don't see a need at all for a signed offset encoding.
>
> The common static relocation types are within [257,313] (before
> R_AARCH64_PLT32).
> Delta encoding allows ~all but the first relocation's type to be
> encoded in a single byte.
>
> How do you design a compression scheme without baked-in knowledge (dictionary)?
> We don't want the generic encoding scheme to hard code relocation type
> range for each architecture.

  reply	other threads:[~2024-04-12 16:19 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-09 15:32 CREL dynamic relocations Wilco Dijkstra
2024-04-11  2:41 ` Fangrui Song
2024-04-12 16:18   ` enh [this message]
2024-04-18  4:31     ` Fangrui Song
  -- strict thread matches above, loose matches on Subject: below --
2024-03-24  5:50 Fangrui Song
2024-03-25 11:52 ` Florian Weimer
2024-03-25 18:51   ` Fangrui Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/libc/involved.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJgzZoox6GqTUwdjD0mqXs=A=O8YSm_qC_hzhtf10_egmAmrDw@mail.gmail.com' \
    --to=enh@google.com \
    --cc=Wilco.Dijkstra@arm.com \
    --cc=fweimer@redhat.com \
    --cc=libc-alpha@sourceware.org \
    --cc=maskray@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).