From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS22989 209.51.188.0/24 X-Spam-Status: No, score=-4.2 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,NICE_REPLY_A,RCVD_IN_DNSWL_HI,RCVD_IN_MSPIKE_H2, SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.6 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id EC2F01F47C for ; Sun, 15 Jan 2023 16:04:00 +0000 (UTC) Authentication-Results: dcvr.yhbt.net; dkim=pass (1024-bit key; unprotected) header.d=cs.ucla.edu header.i=@cs.ucla.edu header.a=rsa-sha256 header.s=78364E5A-2AF3-11ED-87FA-8298ECA2D365 header.b=igYhv0kP; dkim-atps=neutral Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pH5U6-0002DX-LM; Sun, 15 Jan 2023 11:03:54 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pH5U4-00023w-7i for bug-gnulib@gnu.org; Sun, 15 Jan 2023 11:03:48 -0500 Received: from zimbra.cs.ucla.edu ([131.179.128.68]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pH5U1-0002A8-BQ for bug-gnulib@gnu.org; Sun, 15 Jan 2023 11:03:47 -0500 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id C691A160040; Sun, 15 Jan 2023 08:03:38 -0800 (PST) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id 7FAK-N653mDN; Sun, 15 Jan 2023 08:03:37 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id D29C0160041; Sun, 15 Jan 2023 08:03:37 -0800 (PST) DKIM-Filter: OpenDKIM Filter v2.9.2 zimbra.cs.ucla.edu D29C0160041 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cs.ucla.edu; s=78364E5A-2AF3-11ED-87FA-8298ECA2D365; t=1673798617; bh=5ijR15lwS5Ogx2KtzKugEbXpafdpLg/d2C2nFyXyvPY=; h=Message-ID:Date:MIME-Version:To:From:Subject:Content-Type: Content-Transfer-Encoding; b=igYhv0kP0UCEh1wi5WW0wcpC3NTQF1pjLWOnX6DeEI1PF30PVnI6r7wOvgx7D4wEM unD5ry1/aWjQaxCPIJNUMyyPKxunNcLEuFks2G+5MsRr3FWXnCYELA2z5JiRCUD9PZ whB792yg2hGb0SomyRgmB/3j+e2UA1E3MyfLlVdc= X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id hxgaSlOkzAJ6; Sun, 15 Jan 2023 08:03:37 -0800 (PST) Received: from [192.168.1.9] (cpe-172-91-119-151.socal.res.rr.com [172.91.119.151]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id A649F160040; Sun, 15 Jan 2023 08:03:37 -0800 (PST) Message-ID: Date: Sun, 15 Jan 2023 08:03:35 -0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 Content-Language: en-US To: Bruno Haible Cc: Simon Josefsson , bug-gnulib@gnu.org References: <87h6wtgmhy.fsf__22556.7857896507$1673713908$gmane$org@redhat.com> <87lem4cb9v.fsf@josefsson.org> <5459006.YCjZZlMYnJ@nimes> From: Paul Eggert Organization: UCLA Computer Science Department Subject: Re: RFC: git-commit based mtime-reproducible tarballs In-Reply-To: <5459006.YCjZZlMYnJ@nimes> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Received-SPF: pass client-ip=131.179.128.68; envelope-from=eggert@cs.ucla.edu; helo=zimbra.cs.ucla.edu X-Spam_score_int: -42 X-Spam_score: -4.3 X-Spam_bar: ---- X-Spam_report: (-4.3 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, NICE_REPLY_A=-0.001, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: bug-gnulib@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gnulib discussion list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnulib-bounces+normalperson=yhbt.net@gnu.org Sender: bug-gnulib-bounces+normalperson=yhbt.net@gnu.org On 2023-01-15 05:21, Bruno Haible wrote: > Reproducibility is about verifying that an artifact A was generated > from a source S. Quite true. However, there's something else going on: when I do an 'ls -l' of a source directory that I got from a distribution tarball, it's useful to see the last time the contents of each source file was changed upstream. When sources are in a Git repository, I've found the commit timestamp to be a good representation for that. For TZDB, where users have long wanted reproducibility, I use something like this in a Makefile recipe for each source file $$file: time=`git log -1 --format='tformat:%ct' $$file` && touch -cmd @$$time $$file Here are three problems I ran into with this approach, and the solutions that TZDB uses: 1. As you mentioned, what if you're building a release from sources that have not yet been committed? In this case TZDB's Makefile recipe warns but goes ahead with the timestamp that the working file already has. 2. What about platform-independent files that are automatically created from source files from the repository, and that are shipped in the release tarball? In this case, the TZDB Makefile arranges for each such file to have a timestamp one second later than the maximum of timestamps of files that the file depends on. This step is the biggest hassle, since it means I need to repeat in the Makefile the logic that 'make' already uses when calculating dependencies. 3. What about tarball metadata other than last-modified time? Here, TZDB uses the following GNU Tar options: GNUTARFLAGS= --format=pax --pax-option='delete=atime,delete=ctime' \ --numeric-owner --owner=0 --group=0 \ --mode=go+u,go-w --sort=name The need for most of this should be obvious, if one wants the tarball to be reproducible. However, some details are less obvious. GNUTARFLAGS specifies pax format because the default GNU Tar format becomes unportable after 2242-03-16 12:56:32 UTC due to the 33-bit limitation of ustar. And GNUTARFLAGS uses delete=atime,delete=ctime so that atime and ctime do not leak into the tarball and make it less reproducible; since mtime values are always a multiple of 1 second (given steps 1 and 2) this means the tarball will be ustar-compatible until 2242, giving users *plenty* of time to prepare for pax format timestamps. There is an argument that we need not have a fancy GNUTARFLAGS like this, because I'm signing the tarballs and users have to trust me anyway. Still, some users want to "trust but verify" and a reproducible tarball is easier to audit than a non-reproducible one, so for these users it can be a win to omit the irrelevant data from the tarball.