bug-gnulib@gnu.org mirror (unofficial)
 help / color / mirror / Atom feed
From: Bruno Haible <bruno@clisp.org>
To: bug-gnulib@gnu.org, Simon Josefsson <simon@josefsson.org>
Cc: Nick Bowler <nbowler@draconx.ca>
Subject: git repositories vs. tarballs
Date: Mon, 15 Apr 2024 02:19:20 +0200	[thread overview]
Message-ID: <26004807.znHjZqTEeJ@nimes> (raw)

Hi Simon,

In the other thread [1][2][2a], but see also [3] and [4], you are asking

> Has this changed, so we should recommend maintainers
> to 'EXTRA_DIST = bootstrap bootstrap-funclib.sh bootstrap.conf' so this
> is even possible?

1) I think changing the contents of a tarball ad-hoc, like this, will not
   lead to satisfying results, because too many packages will do things
   differently.

   Instead, we should ask the question "for which purposes is <an artifact>
   going to be used?" or "which operations are supported on <an artifact>?".
   Once there is agreement on this question, the contents of the artifact
   will necessarily follow.

2) When considering
     (A) git repositories (or tar.gz files containing their contents,
         e.g. the "snapshot" on https://git.savannah.gnu.org/gitweb/?p=PACKAGE.git
         or the "Download ZIP" on https://github.com/TEAM/PACKAGE),
     (C) a tarball as published on ftp.gnu.org,
   it is also useful to consider
     (E) a binary package .tar.gz / .rpm / .deb
   because there is already a lot of experience for doing "reproducible
   builds" from (C) to (E) [5][6].

3) So, what are the purposes of (A), (C), (E)?

   So far, it has been
     (A) is for users with developer skills, the preferred way to work
         with the source code, including branching and merging of branches.
     (C) is for users and distros, to apply relatively small modifications
         and then build binaries of the package for one or more architectures,
         without needing to fetch anything (other than build prerequisites)
         from the network.
     (E) is for users, to install the package on a specific machine, without
         needing development tools.

4) What do the reproducible builds from (C) to (E) mean? The purpose of (E)
   changes to
     (E+) Like (E), plus:
          A user _with_ development tools can determine whether (E) was
          built with a published build recipe, without tampering.
   Note that this requires
     - formalizing the notion of a build environment [7],
     - adding this build environment into (E) (not yet complete for Debian [8]).

5) There are two wishes that are not yet satisfied by (A) and (C):
   (X) Many users without developer skills are turning to the git repository
       and trying to build from there.
   (Y) Some distros want to be able to verify the tarballs.[9] (I don't agree
       with this. If you can't trust the release manager who produced the
       tarballs (C), you cannot trust (A) either. If there is a mechanism
       for verifying (C) from (A), criminals will commit their malware
       entirely into (A).)

6) How could (X) be implemented?

   The main differences between (A) and (C) are [10]:
     - Tarballs contain source code from other packages.
     - Tarballs contain generated files.
     - Tarballs contain localizations.

   I could imagine an intermediate step between (A) and (C):

     (B) is for users with many packages installed and for distros, to apply
         modifications (even to the set of gnulib modules) and then build
         binaries of the package for one or more architectures, without
         needing to fetch anything (other than build prerequisites) from the
         network.

   This is a different stage than (A), because most developers don't want
   to commit source code from other packages into (A) — due to size — nor
   to commit generated files into (A) — due to hassles with branches.

   Going from (A) to (B) means pulling additional sources from the network.
   It could be implemented
     - by "git submodule update --init", or
     - by 'npm' for JavaScript packages, or
     - by 'cargo' for Rust packages [11]
   and, for the localizations:
     - essentially by a 'wget' command that fetches the *.po files.

   The proposed name of a script that does this is 'autopull.sh'.
   But I am equally open to a declarative YAML file instead of a shell script.

   Going from (B) to (C) means generating files, through invocations of
   gnulib-tool, bison, flex, ... for the code and groff, texinfo, doxygen, ...
   for the documentation.

   The proposed name of a script that does this is 'autogen.sh'.

7) How could (Y) be implemented?
   Like in (E+), we would define:

     (C+) Like (C), plus:
          A user with all kinds of special tools can determine whether (C)
          was built with a published build recipe, without tampering.

   Again, this requires
     - formalizing the notion of a build environment,
     - adding this build environment into (C).

   For example, we would need a way to specify a build dependency on a
   particular version of groff or texinfo or doxygen (for the documentation),
   a particular version of m4, autoconf, automake (for the configure script
   and Makefile.ins).

   So far, some people have published their build environment in form of
   ad-hoc plain text ("This release was bootstrapped with the following tools")
   inside release announcements. [12] Of course, that's the wrong place to
   do so, because a user who receives (C) and wants to verify it does not
   want to search for the release announcement in order to get the build
   environment.

   Some people are suggesting that (Y) could be implemented on top of (X) [9].
   That is, the distro should start from (B), not (C). However, I think it
   does not change much of the problem. The user's question "can I trust (C),
   built by the package's release manager" is replaced with two questions
     "can I trust (B), built by the package's release manager" and
     "can I trust (C), built by the distro's build service".

Please respond with appropriately set "Subject"!! There are many topics here.

Bruno

[1] https://lists.gnu.org/archive/html/bug-gnulib/2024-04/msg00150.html
[2] https://lists.gnu.org/archive/html/bug-gnulib/2024-04/msg00163.html
[2a] https://lists.gnu.org/archive/html/bug-gnulib/2024-04/msg00164.html
[3] https://lists.gnu.org/archive/html/bug-gnulib/2024-04/msg00017.html
[4] https://blog.josefsson.org/2024/04/01/towards-reproducible-minimal-source-code-tarballs-please-welcome-src-tar-gz/
[5] https://reproducible-builds.org/
[6] https://wiki.debian.org/ReproducibleBuilds
[7] https://reproducible-builds.org/docs/recording/
[8] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=763822
[9] https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/thread/YWMNOEJ34Q7QLBWQAB5TM6A2SVJFU4RV/
[10] https://lists.gnu.org/archive/html/bug-gnulib/2024-04/msg00136.html
[11] https://doc.rust-lang.org/stable/cargo/guide/why-cargo-exists.html
[12] https://lists.gnu.org/archive/html/info-gnu/2024-01/msg00015.html





             reply	other threads:[~2024-04-15  0:19 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-15  0:19 Bruno Haible [this message]
2024-04-15  6:48 ` git repositories vs. tarballs Simon Josefsson via Gnulib discussion list

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://lists.gnu.org/mailman/listinfo/bug-gnulib

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=26004807.znHjZqTEeJ@nimes \
    --to=bruno@clisp.org \
    --cc=bug-gnulib@gnu.org \
    --cc=nbowler@draconx.ca \
    --cc=simon@josefsson.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).