bug-gnulib@gnu.org mirror (unofficial)
 help / color / mirror / Atom feed
From: Paul Eggert <eggert@cs.ucla.edu>
To: Bruno Haible <bruno@clisp.org>
Cc: Simon Josefsson <simon@josefsson.org>, bug-gnulib@gnu.org
Subject: Re: RFC: git-commit based mtime-reproducible tarballs
Date: Sun, 15 Jan 2023 08:03:35 -0800	[thread overview]
Message-ID: <ba9aaa86-ccc2-1912-9cea-42b99a88b6da@cs.ucla.edu> (raw)
In-Reply-To: <5459006.YCjZZlMYnJ@nimes>

On 2023-01-15 05:21, Bruno Haible wrote:
> Reproducibility is about verifying that an artifact A was generated
> from a source S.

Quite true. However, there's something else going on: when I do an 'ls 
-l' of a source directory that I got from a distribution tarball, it's 
useful to see the last time the contents of each source file was changed 
upstream. When sources are in a Git repository, I've found the commit 
timestamp to be a good representation for that.

For TZDB, where users have long wanted reproducibility, I use something 
like this in a Makefile recipe for each source file $$file:

	      time=`git log -1 --format='tformat:%ct' $$file` &&
	      touch -cmd @$$time $$file

Here are three problems I ran into with this approach, and the solutions 
that TZDB uses:

1. As you mentioned, what if you're building a release from sources that 
have not yet been committed? In this case TZDB's Makefile recipe warns 
but goes ahead with the timestamp that the working file already has.

2. What about platform-independent files that are automatically created 
from source files from the repository, and that are shipped in the 
release tarball? In this case, the TZDB Makefile arranges for each such 
file to have a timestamp one second later than the maximum of timestamps 
of files that the file depends on. This step is the biggest hassle, 
since it means I need to repeat in the Makefile the logic that 'make' 
already uses when calculating dependencies.

3. What about tarball metadata other than last-modified time? Here, TZDB 
uses the following GNU Tar options:

   GNUTARFLAGS= --format=pax --pax-option='delete=atime,delete=ctime' \
   --numeric-owner --owner=0 --group=0 \
   --mode=go+u,go-w --sort=name

The need for most of this should be obvious, if one wants the tarball to 
be reproducible. However, some details are less obvious. GNUTARFLAGS 
specifies pax format because the default GNU Tar format becomes 
unportable after 2242-03-16 12:56:32 UTC due to the 33-bit limitation of 
ustar. And GNUTARFLAGS uses delete=atime,delete=ctime so that atime and 
ctime do not leak into the tarball and make it less reproducible; since 
mtime values are always a multiple of 1 second (given steps 1 and 2) 
this means the tarball will be ustar-compatible until 2242, giving users 
*plenty* of time to prepare for pax format timestamps.

There is an argument that we need not have a fancy GNUTARFLAGS like 
this, because I'm signing the tarballs and users have to trust me 
anyway. Still, some users want to "trust but verify" and a reproducible 
tarball is easier to audit than a non-reproducible one, so for these 
users it can be a win to omit the irrelevant data from the tarball.

  reply	other threads:[~2023-01-15 16:04 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <87h6wtgmhy.fsf__22556.7857896507$1673713908$gmane$org@redhat.com>
2023-01-15 11:01 ` RFC: git-commit based mtime-reproducible tarballs Simon Josefsson via Gnulib discussion list
2023-01-15 13:21   ` Bruno Haible
2023-01-15 16:03     ` Paul Eggert [this message]
2023-01-15 22:25       ` Bruno Haible
2023-01-16  8:40         ` Simon Josefsson via Gnulib discussion list
2023-01-16  8:51           ` Jim Meyering
2023-01-16  9:45       ` Vivien Kraus
2023-01-16 11:48         ` Bruno Haible
2023-01-16 23:00         ` Simon Josefsson via Gnulib discussion list
2023-01-16  8:28     ` Simon Josefsson via Gnulib discussion list

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

  List information: https://lists.gnu.org/mailman/listinfo/bug-gnulib

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ba9aaa86-ccc2-1912-9cea-42b99a88b6da@cs.ucla.edu \
    --to=eggert@cs.ucla.edu \
    --cc=bruno@clisp.org \
    --cc=bug-gnulib@gnu.org \
    --cc=simon@josefsson.org \


* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).