From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS22989 209.51.188.0/24 X-Spam-Status: No, score=-3.9 required=3.0 tests=AWL,BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H2, SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.6 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 9CF651F47C for ; Mon, 16 Jan 2023 08:41:15 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pHL2o-00022F-IR; Mon, 16 Jan 2023 03:40:42 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pHL2l-00021m-NM for bug-gnulib@gnu.org; Mon, 16 Jan 2023 03:40:40 -0500 Received: from uggla.sjd.se ([2001:9b1:8633::107]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pHL2j-0003kw-KE for bug-gnulib@gnu.org; Mon, 16 Jan 2023 03:40:39 -0500 DKIM-Signature: v=1; a=ed25519-sha256; q=dns/txt; c=relaxed/relaxed; d=josefsson.org; s=ed2110; h=Content-Type:MIME-Version:Message-ID:In-Reply-To :Date:References:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding :Content-ID:Content-Description; bh=L0WNFZkexV1ettKt9h0IaiZVhlwYIw+YmCEK8wJXciE=; t=1673858436; x=1675068036; b=uS0T39XNQMr3nWR+grglWWBCg1IPCmQhuheqERNNEvOaDiYYUX4nFiU2enFeMVM8N73NRUFv4Ok EAsVAeUSrCg==; DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=josefsson.org; s=rsa2110; h=Content-Type:MIME-Version:Message-ID: In-Reply-To:Date:References:Subject:Cc:To:From:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=L0WNFZkexV1ettKt9h0IaiZVhlwYIw+YmCEK8wJXciE=; t=1673858436; x=1675068036; b=repiQXFzXQM1ogaSK4NQJ3OsYaw+0YpQga9YhZDoj0a2WILnjVS0tBY92eNWvUb7sw2nFZPjMF1 9MevXCWD/L+Djd/m0/+3Go0zxCA5TLREQsJtfxy5uSOzNDxbb9QlpgO4kjShe9pSLTNe/c6ExbjK4 6fGAOXuGBd+l/KTZMWfPRl2QsBWWbezcLTVr1z8bV2ECCotQkIjcrkcqR5N95+iDh2t5djRHUJ9ZV xt1Sr1hYspfk0wj0CEh/DRe9dh7zj0hElCY7nOYX5uO0lLEgoPzFiOrie8ulSj1XzVNirZ+lVr7KD tb7EHTq8a6HLnz/gaZUxmQ0nZb9gtJSVpHQ6I9JiNKYBHfAcR0YE4F63Z8qOI8LQjTCHSZY6EIlV9 yivYZ48oeNTeRAnhMDiaE2RxqUyQlMyEpuuq/HyUi3CBaGz9s8KLHhJtu8DhHDF0YE2yyN+bC; Received: from [2001:9b1:41ac:ff00:2422:ac13:8078:4a4e] (port=59448 helo=kaka) by uggla.sjd.se with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1pHL2d-004IS3-NZ; Mon, 16 Jan 2023 09:40:31 +0100 To: Bruno Haible Cc: Paul Eggert , bug-gnulib@gnu.org Subject: Re: RFC: git-commit based mtime-reproducible tarballs References: <87h6wtgmhy.fsf__22556.7857896507$1673713908$gmane$org@redhat.com> <5459006.YCjZZlMYnJ@nimes> <2740098.11c6FMkHaZ@nimes> OpenPGP: id=B1D2BD1375BECB784CF4F8C4D73CF638C53C06BE; url=https://josefsson.org/key-20190320.txt X-Hashcash: 1:22:230116:bug-gnulib@gnu.org::95kqvRST0KASpob7:0hYZ X-Hashcash: 1:22:230116:bruno@clisp.org::1J13Z4LhwEtq8//3:1tt9 X-Hashcash: 1:22:230116:eggert@cs.ucla.edu::N9zs4nh8VIZuBsjL:PImg Date: Mon, 16 Jan 2023 09:40:37 +0100 In-Reply-To: <2740098.11c6FMkHaZ@nimes> (Bruno Haible's message of "Sun, 15 Jan 2023 23:25:58 +0100") Message-ID: <875yd6dg8q.fsf@josefsson.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha256; protocol="application/pgp-signature" Received-SPF: pass client-ip=2001:9b1:8633::107; envelope-from=simon@josefsson.org; helo=uggla.sjd.se X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: bug-gnulib@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gnulib discussion list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-to: Simon Josefsson From: Simon Josefsson via Gnulib discussion list Errors-To: bug-gnulib-bounces+normalperson=yhbt.net@gnu.org Sender: bug-gnulib-bounces+normalperson=yhbt.net@gnu.org --=-=-= Content-Type: text/plain Content-Transfer-Encoding: quoted-printable Bruno Haible writes: > Paul Eggert wrote: >> some users want to "trust but verify" and a reproducible=20 >> tarball is easier to audit than a non-reproducible one, so for these=20 >> users it can be a win to omit the irrelevant data from the tarball. > > Reproducibility can be implemented in different ways: > - by omitting irrelevant data from the tarball, > - by having a customized comparison program 'diff', such that > "diff --ignore-irrelevant-metadata contents1 contents2" > would ignore the irrelevant parts. The problem with a --ignore-irrelevant-metadata approach is that it will be a judgement call what is irrelevant, and two projects may have different philosophies that are mutually incompatible. A devils advocate case: consider a build-system that embeds the source-code timestamp information in the binary, and the binary sends of a hash of its executable binary to a remote server for verification purposes. In some projects this may be what you want to achieve. Then ignoring this particular metadata will be a critical failure for that project. I think it is a worthy goal to reach a tarball that is deterministically and one-way reproducable from git source code [for the same set of tool versions]. >> when I do an 'ls=20 >> -l' of a source directory that I got from a distribution tarball, it's=20 >> useful to see the last time the contents of each source file was changed= =20 >> upstream. > > OK, now we're discussing different ways to make a tarball reproducible. > That's nice, because Simon's proposal was to make all timestamps equal, > and that puts me off. > In binutils-2.40.tar.bz2 all files are from 2023-01-14. > In android-studio-2021.3.1.17-linux.tar.gz all files are from 2010-01-01. > It gives me as a user no idea whether this tarball is 13 years old, > 2 years old, or from yesterday. > > I much prefer Paul's approach, since it still conveys meaningful > timestamps: I agree! I even wonder if the binutils tarball build properly on say HP-UX then? >> For TZDB, where users have long wanted reproducibility, I use something= =20 >> like this in a Makefile recipe for each source file $$file: >>=20 >> time=3D`git log -1 --format=3D'tformat:%ct' $$file` && >> touch -cmd @$$time $$file > > That's good for the files that are under version control. > >> 2. What about platform-independent files that are automatically created= =20 >> from source files from the repository, and that are shipped in the=20 >> release tarball? > > For these, you could unpack the tarball, see in which order the timestamps > are, and then assign artificial timestamps, in the same order but exactly > 2 seconds apart. For example, if the tarball contains > under version control: > hello.c 2023-01-14 13:28:14 > configure.ac 2023-01-01 14:03:07 > and not under version control: > configure 2023-01-15 04:09:10 > config.h.in 2023-01-15 04:05:19 > then you would determine the > max_timestamp_under_vc =3D max { 2023-01-14 13:28:14, 2023-01-01 14:03:= 07 } > =3D 2023-01-14 13:28:14 > and then, since config.h.in is older than configure: > touch -m (max_timestamp_under_vc + 2 seconds) config.h.in > touch -m (max_timestamp_under_vc + 4 seconds) configure > > You can do this without knowing the Makefile rules or scripts which creat= ed > config.h.in and configure. > > The increment of 2 seconds is, of course, for VFAT file systems, which ha= ve > only 2 seconds of resolution for file modification times. Clever! To implement this we would need a dist-hook to do the 'touch -m ...' dance on all files. I somewhat fear that the solution here will be more of a problem than the original problem due to the complexity. Does anyone see a problem with this approach? Do you think it is a good idea? I like it and don't see any further problems, except for the complexity but I don't see a way to reduce it. /Simon --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iIoEARYIADIWIQSjzJyHC50xCrrUzy9RcisI/kdFogUCY8UNhRQcc2ltb25Aam9z ZWZzc29uLm9yZwAKCRBRcisI/kdFolQVAP9kmMmqtOmCu28KxxXEdmzhZYVGS5IH NOa8QTfrj1XyDAEAqmfdfyu8UJvUquTbjCQq8StB5NQr9HH7NBtxsfHrTgI= =bRba -----END PGP SIGNATURE----- --=-=-=--