unofficial mirror of libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Fangrui Song <maskray@gcc.gnu.org>
To: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Cc: GNU C Library <libc-alpha@sourceware.org>,
	Florian Weimer <fweimer@redhat.com>
Subject: Re: CREL dynamic relocations
Date: Wed, 10 Apr 2024 19:41:16 -0700	[thread overview]
Message-ID: <CAN30aBGaKwqCYSEhpjL6LX0+T65dzgpKR4pkXTudWEs1qWQ68g@mail.gmail.com> (raw)
In-Reply-To: <PAWPR08MB89828687CE183D4EC04DAAF483072@PAWPR08MB8982.eurprd08.prod.outlook.com>

Thank you for your interest in the CREL relocation format.

On Tue, Apr 9, 2024 at 8:33 AM Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
> I like the general idea of more compact relocations, however what I don't get is
> what the overall goal is. If the goal is more compact object files, why don't we just
> add a (de)compress pass using a fast compression algorithm? CPU time is cheap
> today, and real compression easily gives 2-4x reduction of object file size, far more
> than you could achieve by just compressing relocations.

My primary goal is to make relocatable files smaller (see
https://sourceware.org/pipermail/binutils/2024-March/133229.html for a
few use cases).
Smaller files benefit applications in several ways, including smaller
I/O amount and lower linker memory usage (for linkers like gold, lld,
and mold that map input files into memory).

Generic data compression formats (like zlib or zstd) applied at the
filesystem level won't achieve this goal because they don't decrease
memory usage.
In addition, filesystem compression does not appear too popular.

Interestingly, I measured a 5.9% size reduction in .o files even after
zstd compression when comparing two Clang builds with and without
CREL.

    % ruby -e 'require "zstd-ruby"; un=com=0;
Dir.glob("/tmp/out/s2-custom0/**/*.o").each{|f| x =
File.open(f,"rb"){|h|h.read}; un+=x.size; com+=Zstd.compress(x).size};
puts "uncompressed: #{un}\ncompressed: #{com}"'
    uncompressed: 136086784
    compressed: 37173381

    % ruby -e 'require "zstd-ruby"; un=com=0;
Dir.glob("/tmp/out/s2-custom1/**/*.o").each{|f| x =
File.open(f,"rb"){|h|h.read}; un+=x.size; com+=Zstd.compress(x).size};
puts "uncompressed: #{un}\ncompressed: #{com}"'
    uncompressed: 111655952
    compressed: 34964421

    1-111655952/136086784 ~= 18.0% (uncompressed)
    1-34964421/37173381 ~= 5.9%    (zstd)

Another objective is to minimize the size of dynamic relocations.
Android achieves this through ld.lld --pack-dyn-relocs=android+relr,
which compacts RELA relocations in their packed format.
While effective, CREL offers a simpler approach that delivers even
greater size reductions.

> Alternatively, if we wanted to access and process ELF files without any decompression,
> we could define compact relocations as fixed-size entries. Using 64 bits for a compact
> RELA relocation gives a straightforward 4x compression. Out of range values could
> use the next entry to extend the ranges.

64 bits are quite large. CREL typically needs just one to three bytes
for one relocation.
How do you design a format that is generic enough to work with all
relocation types and symbol indexes?

> So my main issue with the proposal is that it tries too hard to compress relocations.
> For example using offset compression for relocations, symbol indices and even addends
> seems to have little value: the signed offset means you lose one bit, and if out of range
> values are rare or not grouped together, offset encodings are actually less efficient.

I actually use unsigned delta offset to save one bit but signed delta
symidx/addend.
I have analyzed how many bits are demanded by typical relocations.
Quote https://maskray.me/blog/2024-03-09-a-compact-relocation-format-for-elf#crel-relocation-format
:

    Absolute symbol indexes allow one-byte encoding for symbols in the
range [0,128) and offer minor size advantage for static relocations
when the symbol table is sorted by usage frequency. Delta encoding, on
the other hand, might optimize for the scenario when the symbol table
presents locality: neighbor symbols are frequently mutually called.

    Delta symbol index enables one-byte encoding for GOT/PLT dynamic
relocations when .got/.got.plt entries are ordered by symbol index.
For example, R_*_GLOB_DAT and R_*_JUMP_SLOT relocations can typically
be encoded with repeated 0x05 0x01 (when addend_bit==0 && shift==3,
offset++, symidx++). Delta encoding has a disvantage. It can partial
claim the optimization by arranging symbols in a "cold0 hot cold1"
pattern. In addition, delta symbol index enables one-byte encoding for
GOT/PLT dynamic relocations when .got/.got.plt entries are ordered by
symbol index.

    In my experiments, absolute encoding with ULEB128 results in
slightly larger .o file sizes for both x86-64 and AArch64 builds.

For a decoder that only supports in-reloc addends (recommended for
relocatable files), the C++ implementation is as simple as:

  const auto hdr = decodeULEB128(p);
  const size_t count = hdr / 8, shift = hdr % 4;
  Elf_Addr offset = 0, addend = 0;
  uint32_t symidx = 0, type = 0;
  for (size_t i = 0; i != count; ++i) {
    const uint8_t b = *p++;
    offset += b >> 3;
    if (b >= 0x80) offset += (decodeULEB128(p) << 4) - 0x10;
    if (b & 1) symidx += decodeSLEB128(p);
    if (b & 2) type += decodeSLEB128(p);
    if (b & 4) addend += decodeSLEB128(p);
    rels[i] = {offset << shift, symidx, type, addend};
  }

+= for all of symidx/type/addend is for consistency, but the choice
turns out to be very good as well.

> I don't get the discussion about relocation numbers on AArch64 - 4 or 5 bits would
> handle all frequently used relocations, so we'd just remap them to fit in the short
> encoding. Hence I don't see a need at all for a signed offset encoding.

The common static relocation types are within [257,313] (before
R_AARCH64_PLT32).
Delta encoding allows ~all but the first relocation's type to be
encoded in a single byte.

How do you design a compression scheme without baked-in knowledge (dictionary)?
We don't want the generic encoding scheme to hard code relocation type
range for each architecture.

  reply	other threads:[~2024-04-11  2:42 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-09 15:32 CREL dynamic relocations Wilco Dijkstra
2024-04-11  2:41 ` Fangrui Song [this message]
2024-04-12 16:18   ` enh
2024-04-18  4:31     ` Fangrui Song
  -- strict thread matches above, loose matches on Subject: below --
2024-03-24  5:50 Fangrui Song
2024-03-25 11:52 ` Florian Weimer
2024-03-25 18:51   ` Fangrui Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/libc/involved.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAN30aBGaKwqCYSEhpjL6LX0+T65dzgpKR4pkXTudWEs1qWQ68g@mail.gmail.com \
    --to=maskray@gcc.gnu.org \
    --cc=Wilco.Dijkstra@arm.com \
    --cc=fweimer@redhat.com \
    --cc=libc-alpha@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).