From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on starla X-Spam-Level: X-Spam-Status: No, score=-3.0 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id B88F61F44D for ; Thu, 11 Apr 2024 02:42:10 +0000 (UTC) Authentication-Results: dcvr.yhbt.net; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=WUsOTVRr; dkim-atps=neutral Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A4209384AB42 for ; Thu, 11 Apr 2024 02:42:06 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A4209384AB42 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1712803326; bh=/E2ECsedNOptXJH/pVtkbrgfPQSQdrK9itBoNsIlTXU=; h=References:In-Reply-To:From:Date:Subject:To:Cc:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From; b=WUsOTVRrKAn4XpuhQlSRmSo33LFklEdaoHqHlNCaD650CRECQZXNw1nEhSrCEsXZx +GTcF/mnt+c/xpb9cEs0eskf65FvqllVHEtHlzQ6IGeOeVgI6tUIYgxhr0FF39e+UC 5i5NorYibieeafnOuhkYNXhTq06lVTsLCu90dIwQ= Received: from mail-vk1-f176.google.com (mail-vk1-f176.google.com [209.85.221.176]) by sourceware.org (Postfix) with ESMTPS id 21C173858417 for ; Thu, 11 Apr 2024 02:41:29 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 21C173858417 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=gcc.gnu.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 21C173858417 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=209.85.221.176 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1712803291; cv=none; b=QkDl0kvC3pZ7+dAPbRGNc98jPXMVNg/mRjJDu7eMObGDywTVlnp99e38bYsrD2Km1Mx1TUHxpRjkaBN9whWnN1pZoo0ddSQreYFCXcEIvMF2ZuR2IKOwPEraKPYfOtF3CI2mzGhpy12WaxlWnWH62TbVe/Ibl2IVPrAAz8Tz8Ig= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1712803291; c=relaxed/simple; bh=mPRTvNMoO+MrIWOvSUaorX82cHDcQrQ8pMNmqZO5eqU=; h=MIME-Version:From:Date:Message-ID:Subject:To; b=MuCEYnEMzwf0OBjAbtC+0+cLqT6DP0WdKxInpQB8REbzIumpVW9+GyHH7LMQZZKmhFGBUE5ZElyc03DhTy2oiYkZAbo6iWk5NI7NKAz2UGfAyBnIjFxfZwx5ZFBGJDpyv/OlCeQVn17xvT9EZmsSIT5OGrv4ByUB+mZE5oMCIeI= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-vk1-f176.google.com with SMTP id 71dfb90a1353d-4dac92abe71so1526649e0c.2 for ; Wed, 10 Apr 2024 19:41:29 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712803288; x=1713408088; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/E2ECsedNOptXJH/pVtkbrgfPQSQdrK9itBoNsIlTXU=; b=Uy66/kKTlXAqhcvlc9S1HF3+aBbSwpoi3bziOEnMnNF4TEZuXbifxYAZ02Bg77pY/s qGINxYaLiUTTEGBVUYI9kpZ/gTXhAd40E4pN+DQ8snL1QvYaAwczNZ5lffmaO6SuRbcP JNY4J89MnSWNnTdhPpskkn/QiaY4i2EBWINjCMC0qnmdT+9rbJU4dI/JeETmI/6NE+3O 4bBz4fkxhoQx4SQlTPVbJEVmMUI/1B3qiTzmOHOZ9qPKz+vrr0OM14QAMULR0+KrBWgM Nhm3UxHa4cH95Md2CBrb46FQsX9k7Vrm/Ud1XRaKDFhJMj/PdbznPHjjB+8d9hDjehth z0XA== X-Gm-Message-State: AOJu0Yw/NUU2CmHKR9brWL4CkLCOWeDIGAtx6fgwD4O4lQ2hpVQXMjJ8 HsXa7XmxyHHjWKKY9LKWP8yF/Ywk/OlqWUyW/yRNOaSq60Vb6vUu7bHmDg== X-Google-Smtp-Source: AGHT+IEMpwaShtd5eWVALxBQ04R+BDVSQfCU7xJhMS9CQbpQzlVAMzaOR5MwHoLvcGIm+5+GYoFkng== X-Received: by 2002:a05:6122:3c95:b0:4d4:1551:6ef6 with SMTP id fy21-20020a0561223c9500b004d415516ef6mr5528165vkb.2.1712803288281; Wed, 10 Apr 2024 19:41:28 -0700 (PDT) Received: from mail-vk1-f172.google.com (mail-vk1-f172.google.com. [209.85.221.172]) by smtp.gmail.com with ESMTPSA id l3-20020a1ffe03000000b004d895c72d56sm96475vki.50.2024.04.10.19.41.28 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 10 Apr 2024 19:41:28 -0700 (PDT) Received: by mail-vk1-f172.google.com with SMTP id 71dfb90a1353d-4dac3cbc8fdso2006847e0c.0 for ; Wed, 10 Apr 2024 19:41:28 -0700 (PDT) X-Received: by 2002:a05:6122:45a0:b0:4d8:74a2:6d35 with SMTP id de32-20020a05612245a000b004d874a26d35mr4934396vkb.9.1712803287797; Wed, 10 Apr 2024 19:41:27 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Fangrui Song Date: Wed, 10 Apr 2024 19:41:16 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: CREL dynamic relocations To: Wilco Dijkstra Cc: GNU C Library , Florian Weimer Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces+e=80x24.org@sourceware.org Thank you for your interest in the CREL relocation format. On Tue, Apr 9, 2024 at 8:33=E2=80=AFAM Wilco Dijkstra wrote: > I like the general idea of more compact relocations, however what I don't= get is > what the overall goal is. If the goal is more compact object files, why d= on't we just > add a (de)compress pass using a fast compression algorithm? CPU time is c= heap > today, and real compression easily gives 2-4x reduction of object file si= ze, far more > than you could achieve by just compressing relocations. My primary goal is to make relocatable files smaller (see https://sourceware.org/pipermail/binutils/2024-March/133229.html for a few use cases). Smaller files benefit applications in several ways, including smaller I/O amount and lower linker memory usage (for linkers like gold, lld, and mold that map input files into memory). Generic data compression formats (like zlib or zstd) applied at the filesystem level won't achieve this goal because they don't decrease memory usage. In addition, filesystem compression does not appear too popular. Interestingly, I measured a 5.9% size reduction in .o files even after zstd compression when comparing two Clang builds with and without CREL. % ruby -e 'require "zstd-ruby"; un=3Dcom=3D0; Dir.glob("/tmp/out/s2-custom0/**/*.o").each{|f| x =3D File.open(f,"rb"){|h|h.read}; un+=3Dx.size; com+=3DZstd.compress(x).size}; puts "uncompressed: #{un}\ncompressed: #{com}"' uncompressed: 136086784 compressed: 37173381 % ruby -e 'require "zstd-ruby"; un=3Dcom=3D0; Dir.glob("/tmp/out/s2-custom1/**/*.o").each{|f| x =3D File.open(f,"rb"){|h|h.read}; un+=3Dx.size; com+=3DZstd.compress(x).size}; puts "uncompressed: #{un}\ncompressed: #{com}"' uncompressed: 111655952 compressed: 34964421 1-111655952/136086784 ~=3D 18.0% (uncompressed) 1-34964421/37173381 ~=3D 5.9% (zstd) Another objective is to minimize the size of dynamic relocations. Android achieves this through ld.lld --pack-dyn-relocs=3Dandroid+relr, which compacts RELA relocations in their packed format. While effective, CREL offers a simpler approach that delivers even greater size reductions. > Alternatively, if we wanted to access and process ELF files without any d= ecompression, > we could define compact relocations as fixed-size entries. Using 64 bits = for a compact > RELA relocation gives a straightforward 4x compression. Out of range valu= es could > use the next entry to extend the ranges. 64 bits are quite large. CREL typically needs just one to three bytes for one relocation. How do you design a format that is generic enough to work with all relocation types and symbol indexes? > So my main issue with the proposal is that it tries too hard to compress = relocations. > For example using offset compression for relocations, symbol indices and = even addends > seems to have little value: the signed offset means you lose one bit, and= if out of range > values are rare or not grouped together, offset encodings are actually le= ss efficient. I actually use unsigned delta offset to save one bit but signed delta symidx/addend. I have analyzed how many bits are demanded by typical relocations. Quote https://maskray.me/blog/2024-03-09-a-compact-relocation-format-for-el= f#crel-relocation-format : Absolute symbol indexes allow one-byte encoding for symbols in the range [0,128) and offer minor size advantage for static relocations when the symbol table is sorted by usage frequency. Delta encoding, on the other hand, might optimize for the scenario when the symbol table presents locality: neighbor symbols are frequently mutually called. Delta symbol index enables one-byte encoding for GOT/PLT dynamic relocations when .got/.got.plt entries are ordered by symbol index. For example, R_*_GLOB_DAT and R_*_JUMP_SLOT relocations can typically be encoded with repeated 0x05 0x01 (when addend_bit=3D=3D0 && shift=3D=3D3, offset++, symidx++). Delta encoding has a disvantage. It can partial claim the optimization by arranging symbols in a "cold0 hot cold1" pattern. In addition, delta symbol index enables one-byte encoding for GOT/PLT dynamic relocations when .got/.got.plt entries are ordered by symbol index. In my experiments, absolute encoding with ULEB128 results in slightly larger .o file sizes for both x86-64 and AArch64 builds. For a decoder that only supports in-reloc addends (recommended for relocatable files), the C++ implementation is as simple as: const auto hdr =3D decodeULEB128(p); const size_t count =3D hdr / 8, shift =3D hdr % 4; Elf_Addr offset =3D 0, addend =3D 0; uint32_t symidx =3D 0, type =3D 0; for (size_t i =3D 0; i !=3D count; ++i) { const uint8_t b =3D *p++; offset +=3D b >> 3; if (b >=3D 0x80) offset +=3D (decodeULEB128(p) << 4) - 0x10; if (b & 1) symidx +=3D decodeSLEB128(p); if (b & 2) type +=3D decodeSLEB128(p); if (b & 4) addend +=3D decodeSLEB128(p); rels[i] =3D {offset << shift, symidx, type, addend}; } +=3D for all of symidx/type/addend is for consistency, but the choice turns out to be very good as well. > I don't get the discussion about relocation numbers on AArch64 - 4 or 5 b= its would > handle all frequently used relocations, so we'd just remap them to fit in= the short > encoding. Hence I don't see a need at all for a signed offset encoding. The common static relocation types are within [257,313] (before R_AARCH64_PLT32). Delta encoding allows ~all but the first relocation's type to be encoded in a single byte. How do you design a compression scheme without baked-in knowledge (dictiona= ry)? We don't want the generic encoding scheme to hard code relocation type range for each architecture.