Bruno Haible writes: > Hi Simon, > >> you can ... via >> GNULIB_REVISION pick out exactly the gnulib git revision that libpaper >> needs. ... >> [1] https://blog.josefsson.org/2024/04/13/reproducible-and-minimal-source-only-tarballs/ >> [2] https://salsa.debian.org/auth-team/libntlm/-/tree/master/debian > > I see GNULIB_REVISION as an obsolete alternative to git submodules, and > would therefore discourage rather than propagate its use. I think it will be challenging for gnulib to insists on always being used as a git submodule, and I would prefer if we continue support multiple ways of working. Personally I have been migrating towards gnulib git submodules because most other projects use gnulib like that, but I've never really felt comfortable with them. Some of the concerns I have: - git submodules leads to -- in my subjective opinion -- complexity which leads to a worse user experience for developers. I have learned to work with git submodules over the years, but it was a hurdle that I don't want to force on everyone. - the gnulib git submodule is huge. Not rarely I get out of memory errors during 'git clone' in CI/CD jobs. I can restart the jobs manually, but this indicate that there is a resource drain here. For a tiny project like libntlm the imbalance if the small project code and large gnulib is troubling. - often CI/CD platforms have different ways of working with git submodules which adds complexity which leads to bugs. Allowing maintainers to decide if they want to work with git submodules or not seems like a good thing. - we don't offer any way for people receiving tarballs to learn which gnulib git commit was used (you noticed this too below) but with a GNULIB_REVISION approach this is part of the tarball, just like any other versioned dependency on autoconf, automake etc - I think gnulib could be regarded as any other external dependency, just like autoconf, automake, libtool etc that also generate files in my build tree during bootstrapping. I don't put autoconf as a git submodule, why should I put gnulib as one? Granted, these concerns are a bit vague and subjective. > Currently libntlm has this in its bootstrap.conf: > > GNULIB_REVISION=dfb71172a46ef41f8cf8ab7ca529c1dd3097a41d > > and GNU make has this: > > GNULIB_REVISION=stable-202307 Interesting. This suggests the GNULIB_REVISION approach isn't the entire solution either. I think it is useful to record the gnulib git commit used to prepare a tarball, and have that git commit id be part of the shipped tarball, and stored inside the git repository. The first use above achieve this, but the second one doesn't (branches/tags are moving targets). If I download the gzip tarball I can't find anywhere what gnulib commit was used for bootstrapping. It is quite cumbersome to verify that the tarball didn't contain any modified gnulib code. This is even harder when projects INTENTIONALLY modify gnulib code compared to what's in gnulib git, which coreutils and several others projects does through gnulib *.diff/*.patch files. Ultimately, I think there is an important use-case to build projects directly from source code without having tarballs with pre-generated files that are not reproduced by the user. > The differences between both approaches are: > > - GNULIB_REVISION works only with the 'bootstrap' program. The submodules > approach works also without 'bootstrap'. What use case are you thinking of? The gnulib git commit information consumers that I can think of are gnulib-aware. > - For GNULIB_REVISION, the user is on their own regarding tooling, aside > from 'bootstrap'. In the submodules approach, the 'git' suite provides > the tooling, and many developers are familiar with it. Yes, but developers also like flexibility, and in some situations I think the git approach is not the best way of working. > - .tar.gz files created by the gitweb "snapshot" link, by the cgit "refs > > Download" section, or the GitHub "Download ZIP" button contain an empty > directory in place of the submodule, and no information about the revision. > Whereas they contain the file with the GNULIB_REVISION assignment. Indeed, this was the main challenge for me. That is critical information for anyone who wants to avoid touching tarballs with pre-generated content. >> I should write a post to debian-devel describing this pattern on >> how to use gnulib in Debian packages > > It feels wrong to me if, in order to get meta-information about required > dependencies of a package, Debian tools grep a particular file for a specific > string. This approach is simply too limited. Meta-information about dependencies are normally always hand-curated in Debian (the Build-Depends: header). The simplest solution is for the Debian package maintainer to figure out which gnulib git commit version was used for a release and pin that manually in the debian/rules makefile. If the information is available in bootstrap.conf via GNULIB_REVISION that saves time. I believe this is conceptually the same thing as pinning version information for any other dependency. > The correct way, IMO, would be that 'git' provides this meta-information, > either embedded in the .tar.gz generated by the web tooling, or in a > separate .tar.gz. AFAICT, 'git' currently does not have this ability. > Therefore we need to approach the 'git' team, in order to find a solution > that scales across the whole set of software package — not specific to > gnulib and not specific to 'bootstrap'. Yes I was quite disappointed when I realized that 'git archive' doesn't record the git submodule git commit anywhere. Couldn't the .gitmodules file be extended to allow specifying the git commit of the submodule? I think GitLab/GitHub/etc use 'git archive' under the hood, so we could ask that the .gitmodules file is extended to hold the commit (just like it holds branch name now). Alternatively, get 'git archive' to somehow record the submodule commit in some other way. I suppose we could recommend a practice for gnulib users that use gnulib git submodules to put this in .gitmodules: # GNULIB_REVISION=dfb71172a46ef41f8cf8ab7ca529c1dd3097a41d Then this will be part of the 'git archive' output (I think?) and we would have to also recommend to 'EXTRA_DIST += .gitmodules' so this information is included in the tarballs. However. I want to deploy a solution that works now while we wait for git to add this feature (or not). I think we also may find that requiring 'git' for building packages cause a cyclic dependency for boostrap people. Requiring 'tar' is okay because 'tar' has very few other pre-dependencies. I don't know how to solve this problem though: some way to ship all gnulib git commits in a compact way that can be extracted easily without complex tools are necessary. Maybe a minimal git like 'bootstrap-git' could be written that just supports cloning from a git bundle and nothing else. /Simon > Bruno > > [1] https://stackoverflow.com/questions/1777854/how-can-i-specify-a-branch-tag-when-adding-a-git-submodule > [2] https://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=tree > [3] https://git.savannah.gnu.org/cgit/coreutils.git/tree/ > > > > >