From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <libc-alpha-bounces+e=80x24.org@sourceware.org>
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on starla
X-Spam-Level: 
X-Spam-Status: No, score=-8.5 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,
	MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,SPF_HELO_NONE,SPF_PASS,
	USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6
Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256)
	(No client certificate requested)
	by dcvr.yhbt.net (Postfix) with ESMTPS id 0C7371F44D
	for <e@80x24.org>; Fri, 12 Apr 2024 16:19:36 +0000 (UTC)
Authentication-Results: dcvr.yhbt.net;
	dkim=pass (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20230601 header.b=2B2VE5eF;
	dkim-atps=neutral
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 5D1E43858417
	for <e@80x24.org>; Fri, 12 Apr 2024 16:19:35 +0000 (GMT)
Received: from mail-qv1-xf33.google.com (mail-qv1-xf33.google.com
 [IPv6:2607:f8b0:4864:20::f33])
 by sourceware.org (Postfix) with ESMTPS id D5AD83858C56
 for <libc-alpha@sourceware.org>; Fri, 12 Apr 2024 16:19:12 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D5AD83858C56
Authentication-Results: sourceware.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=google.com
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org D5AD83858C56
Authentication-Results: server2.sourceware.org;
 arc=none smtp.remote-ip=2607:f8b0:4864:20::f33
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1712938755; cv=none;
 b=iPAa9Jny41CjhXbBJwnRxMz1TANPFOKBCPmcn8VT5xu6cX7qHURQsH9YLgQwEL6g/X/NIJyVnrkYMVv35Rhsapn3QHFwqnzcNT0opP/TyvYHDcInN+PuqK82UzSurD7bUd4js788DjZOOCXZO7v061uIoan38X0DRxfbvX9pY/E=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
 t=1712938755; c=relaxed/simple;
 bh=jd6nR1XTIWfIve6zevTlFDLtvekbIXXOlyvBsGoLVDc=;
 h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To;
 b=dhS2kfm42o5TQLQRi7/BXntWnbthVCSUFqSUbr7eZDKSbnIbQWTeXgp5isbd+E154CQzWk7g+0XqRgZn4Waf/JQEPErUumJE4KyV/bZwjatl5VEhKm9bcmpAyTzyDdc+RWxBZTkv+Rh1G2MKXgELWL0VXmfUHtN5naDmUoaH1ic=
ARC-Authentication-Results: i=1; server2.sourceware.org
Received: by mail-qv1-xf33.google.com with SMTP id
 6a1803df08f44-69b224e025dso5088126d6.1
 for <libc-alpha@sourceware.org>; Fri, 12 Apr 2024 09:19:12 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=google.com; s=20230601; t=1712938752; x=1713543552; darn=sourceware.org;
 h=content-transfer-encoding:cc:to:subject:message-id:date:from
 :in-reply-to:references:mime-version:from:to:cc:subject:date
 :message-id:reply-to;
 bh=9xDGEP8Ef/U6zQsCpyNeJipZkQJJB9a3mHo6hGl0FiU=;
 b=2B2VE5eFASg6jeD9/z0UiPhB2cXJYEWOiYA42y4vi+B/V16119UE2LHy5gLnFuMOtJ
 bVnMVnTbkZstOSlZWBgZaJ2UQxOyc4ClXjamyK+2ALpgl3D7At1d/+rDQSwaZ1Eu8DEM
 74TlpzyGSIxkZTo7QxK27VSmCCMBpqZgGezRjPt8Fwyk3C6xyWxzcKqH1FqnNTWQhG2O
 y+YJckclWCNlIeg8IJpYRBv0l0ZTKXWOjPUHjtv2ojc9FljLzJabkKFkMFGMktFgvDXt
 mcEKPg2bY6aWmBSYMfBzk3BS28WOg8fQPkTmoUYwvukhrkH4oMfg0pi4L5ToCm/YCXJ8
 bJOg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20230601; t=1712938752; x=1713543552;
 h=content-transfer-encoding:cc:to:subject:message-id:date:from
 :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
 :subject:date:message-id:reply-to;
 bh=9xDGEP8Ef/U6zQsCpyNeJipZkQJJB9a3mHo6hGl0FiU=;
 b=O+jUaq2wLBe9Q+qInwTW1nNJA31Q5lm6XZjl2hfXdo24muM6bl5rNpBwfjyo4qM4ld
 KpaF9eO8pOT2/RMZKVQJ1fboWLfylo07X+9wWeIu+l1q/tIOtZ1U03aISBCZ7V/8iNpX
 1KGPuAAzLOxGeL4zNKDx4x9KwH/3xuVlgdLc+d2/PjeyhyY7kSAQdu34WLy9icMVRvRy
 gIaokf+umkQBW2gskRHO1M85HK9RAdVDkVBK8jNiOjBm5jlvphhEipRYh/skwz1+COUv
 ilc62MV/c5ro3wJi/0wybx2IF5AaaLSWuQsyQtIxNdF662MWcTLJyqqmQd/Md6m+M0zY
 aO6g==
X-Forwarded-Encrypted: i=1;
 AJvYcCXIYyUYQDvRs1gGVnGvQi4THxszvTLYBI1A97Ad5EywCVTERzsHd4HVm3cUhDSRS+WHyQJALaqnJXaETsg6s2ZdoF8JhorxPTQJ
X-Gm-Message-State: AOJu0YzxUyUV3pmLzMvSPvJN9tTgHTdI+CjN4dmt/byWdhMHWy8RZEZz
 2A1SQjm5ox0zC9ZeUU9a6gK81Sx+BhvUacMClUgvnuq1grj1176MJTiCz2JZL10FsEkPuv2BwG3
 v48MLAZzaKpbXLRMDMs13S5b5DvD9O0ZZXaRRhNSVFS3tgmwsP5KMuxI=
X-Google-Smtp-Source: AGHT+IFGH6KYJc4fB8ExkcdKz5Q7pAtIEuer8s9lMb1XJ/qPNm/Iik9/tvTo94mIvC8zmODJNBaC3RBDWo1FfJlnf+8=
X-Received: by 2002:ad4:55ef:0:b0:69b:16a0:2f16 with SMTP id
 bu15-20020ad455ef000000b0069b16a02f16mr2958481qvb.18.1712938751800; Fri, 12
 Apr 2024 09:19:11 -0700 (PDT)
MIME-Version: 1.0
References: <PAWPR08MB89828687CE183D4EC04DAAF483072@PAWPR08MB8982.eurprd08.prod.outlook.com>
 <CAN30aBGaKwqCYSEhpjL6LX0+T65dzgpKR4pkXTudWEs1qWQ68g@mail.gmail.com>
In-Reply-To: <CAN30aBGaKwqCYSEhpjL6LX0+T65dzgpKR4pkXTudWEs1qWQ68g@mail.gmail.com>
From: enh <enh@google.com>
Date: Fri, 12 Apr 2024 09:18:56 -0700
Message-ID: <CAJgzZoox6GqTUwdjD0mqXs=A=O8YSm_qC_hzhtf10_egmAmrDw@mail.gmail.com>
Subject: Re: CREL dynamic relocations
To: Fangrui Song <maskray@gcc.gnu.org>
Cc: Wilco Dijkstra <Wilco.Dijkstra@arm.com>,
 GNU C Library <libc-alpha@sourceware.org>, 
 Florian Weimer <fweimer@redhat.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-BeenThere: libc-alpha@sourceware.org
X-Mailman-Version: 2.1.30
Precedence: list
List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=subscribe>
Errors-To: libc-alpha-bounces+e=80x24.org@sourceware.org

On Wed, Apr 10, 2024 at 7:41=E2=80=AFPM Fangrui Song <maskray@gcc.gnu.org> =
wrote:
>
> Thank you for your interest in the CREL relocation format.
>
> On Tue, Apr 9, 2024 at 8:33=E2=80=AFAM Wilco Dijkstra <Wilco.Dijkstra@arm=
.com> wrote:
> > I like the general idea of more compact relocations, however what I don=
't get is
> > what the overall goal is. If the goal is more compact object files, why=
 don't we just
> > add a (de)compress pass using a fast compression algorithm? CPU time is=
 cheap
> > today, and real compression easily gives 2-4x reduction of object file =
size, far more
> > than you could achieve by just compressing relocations.
>
> My primary goal is to make relocatable files smaller (see
> https://sourceware.org/pipermail/binutils/2024-March/133229.html for a
> few use cases).
> Smaller files benefit applications in several ways, including smaller
> I/O amount and lower linker memory usage (for linkers like gold, lld,
> and mold that map input files into memory).
>
> Generic data compression formats (like zlib or zstd) applied at the
> filesystem level won't achieve this goal because they don't decrease
> memory usage.
> In addition, filesystem compression does not appear too popular.
>
> Interestingly, I measured a 5.9% size reduction in .o files even after
> zstd compression when comparing two Clang builds with and without
> CREL.
>
>     % ruby -e 'require "zstd-ruby"; un=3Dcom=3D0;
> Dir.glob("/tmp/out/s2-custom0/**/*.o").each{|f| x =3D
> File.open(f,"rb"){|h|h.read}; un+=3Dx.size; com+=3DZstd.compress(x).size}=
;
> puts "uncompressed: #{un}\ncompressed: #{com}"'
>     uncompressed: 136086784
>     compressed: 37173381
>
>     % ruby -e 'require "zstd-ruby"; un=3Dcom=3D0;
> Dir.glob("/tmp/out/s2-custom1/**/*.o").each{|f| x =3D
> File.open(f,"rb"){|h|h.read}; un+=3Dx.size; com+=3DZstd.compress(x).size}=
;
> puts "uncompressed: #{un}\ncompressed: #{com}"'
>     uncompressed: 111655952
>     compressed: 34964421
>
>     1-111655952/136086784 ~=3D 18.0% (uncompressed)
>     1-34964421/37173381 ~=3D 5.9%    (zstd)
>
> Another objective is to minimize the size of dynamic relocations.
> Android achieves this through ld.lld --pack-dyn-relocs=3Dandroid+relr,
> which compacts RELA relocations in their packed format.
> While effective, CREL offers a simpler approach that delivers even
> greater size reductions.

yeah, though android+relr has the advantage of having already shipped,
so it's usable on just about every device that app developers still
support (api 23 --
https://android.googlesource.com/platform/bionic/+/master/android-changes-f=
or-ndk-developers.md#relative-relocations-relr
-- versus the effective minimum of api 21) :-)

something new today wouldn't be as widely usable on Android for about
a decade, so although we'd probably support it if others did, to be
really interesting for Android -- the point where we'd implement it
even if no-one else did -- it'd have to be 2x better (in space or
time, and not sacrificing the other, because app developers are very
conscious of both) rather than the little bit better that it actually
is. given our long lead time [for app developers to be able to rely on
ubiquity] and our "good enough" solution, i'm actually more interested
in the "what if we re-thought _everything_?" end of the spectrum than
small tweaks here.

what does mach-o do here? (that is: "why doesn't Apple care?".)

> > Alternatively, if we wanted to access and process ELF files without any=
 decompression,
> > we could define compact relocations as fixed-size entries. Using 64 bit=
s for a compact
> > RELA relocation gives a straightforward 4x compression. Out of range va=
lues could
> > use the next entry to extend the ranges.
>
> 64 bits are quite large. CREL typically needs just one to three bytes
> for one relocation.
> How do you design a format that is generic enough to work with all
> relocation types and symbol indexes?
>
> > So my main issue with the proposal is that it tries too hard to compres=
s relocations.
> > For example using offset compression for relocations, symbol indices an=
d even addends
> > seems to have little value: the signed offset means you lose one bit, a=
nd if out of range
> > values are rare or not grouped together, offset encodings are actually =
less efficient.
>
> I actually use unsigned delta offset to save one bit but signed delta
> symidx/addend.
> I have analyzed how many bits are demanded by typical relocations.
> Quote https://maskray.me/blog/2024-03-09-a-compact-relocation-format-for-=
elf#crel-relocation-format
> :
>
>     Absolute symbol indexes allow one-byte encoding for symbols in the
> range [0,128) and offer minor size advantage for static relocations
> when the symbol table is sorted by usage frequency. Delta encoding, on
> the other hand, might optimize for the scenario when the symbol table
> presents locality: neighbor symbols are frequently mutually called.
>
>     Delta symbol index enables one-byte encoding for GOT/PLT dynamic
> relocations when .got/.got.plt entries are ordered by symbol index.
> For example, R_*_GLOB_DAT and R_*_JUMP_SLOT relocations can typically
> be encoded with repeated 0x05 0x01 (when addend_bit=3D=3D0 && shift=3D=3D=
3,
> offset++, symidx++). Delta encoding has a disvantage. It can partial
> claim the optimization by arranging symbols in a "cold0 hot cold1"
> pattern. In addition, delta symbol index enables one-byte encoding for
> GOT/PLT dynamic relocations when .got/.got.plt entries are ordered by
> symbol index.
>
>     In my experiments, absolute encoding with ULEB128 results in
> slightly larger .o file sizes for both x86-64 and AArch64 builds.
>
> For a decoder that only supports in-reloc addends (recommended for
> relocatable files), the C++ implementation is as simple as:
>
>   const auto hdr =3D decodeULEB128(p);
>   const size_t count =3D hdr / 8, shift =3D hdr % 4;
>   Elf_Addr offset =3D 0, addend =3D 0;
>   uint32_t symidx =3D 0, type =3D 0;
>   for (size_t i =3D 0; i !=3D count; ++i) {
>     const uint8_t b =3D *p++;
>     offset +=3D b >> 3;
>     if (b >=3D 0x80) offset +=3D (decodeULEB128(p) << 4) - 0x10;
>     if (b & 1) symidx +=3D decodeSLEB128(p);
>     if (b & 2) type +=3D decodeSLEB128(p);
>     if (b & 4) addend +=3D decodeSLEB128(p);
>     rels[i] =3D {offset << shift, symidx, type, addend};
>   }
>
> +=3D for all of symidx/type/addend is for consistency, but the choice
> turns out to be very good as well.
>
> > I don't get the discussion about relocation numbers on AArch64 - 4 or 5=
 bits would
> > handle all frequently used relocations, so we'd just remap them to fit =
in the short
> > encoding. Hence I don't see a need at all for a signed offset encoding.
>
> The common static relocation types are within [257,313] (before
> R_AARCH64_PLT32).
> Delta encoding allows ~all but the first relocation's type to be
> encoded in a single byte.
>
> How do you design a compression scheme without baked-in knowledge (dictio=
nary)?
> We don't want the generic encoding scheme to hard code relocation type
> range for each architecture.