Hi Ævar, On Thu, 2 May 2019, Ævar Arnfjörð Bjarmason wrote: > On Fri, Apr 26 2019, Johannes Schindelin wrote: > > > On Sat, 13 Apr 2019, brian m. carlson wrote: > > > >> On Fri, Apr 12, 2019 at 09:51:02PM -0400, Jeff King wrote: > >> > I wondered how you were going to kick this in, since users can > >> > define arbitrary filters. I think it's kind of neat to > >> > automagically convert "gzip -cn" (which also happens to be the > >> > default). But I think we should mention that in the Documentation, > >> > in case somebody tries to use a custom version of gzip and wonders > >> > why it isn't kicking in. > >> > > >> > Likewise, it might make sense in the tests to put a poison gzip in > >> > the $PATH so that we can be sure we're using our internal code, and > >> > not just calling out to gzip (on platforms that have it, of > >> > course). > >> > > >> > The alternative is that we could use a special token like ":zlib" > >> > or something to indicate that the internal implementation should be > >> > used (and then tweak the baked-in default, too). That might be less > >> > surprising for users, but most people would still get the benefit > >> > since they'd be using the default config. > >> > >> I agree that a special value (or NULL, if that's possible) would be > >> nicer here. That way, if someone does specify a custom gzip, we honor > >> it, and it serves to document the code better. For example, if > >> someone symlinked pigz to gzip and used "gzip -cn", then they might > >> not get the parallelization benefits they expected. > > > > I went with `:zlib`. The `NULL` value would not really work, as there > > is no way to specify that via `archive.tgz.command`. > > > > About the symlinked thing: I do not really want to care to support > > such hacks. > > It's the standard way by which a lot of systems do this, e.g. on my > Debian box: > > $ find /{,s}bin /usr/{,s}bin -type l -exec file {} \;|grep /etc/alternatives|wc -l > 108 > > To write this E-Mail I'm invoking one such symlink :) I am well aware of the way Debian-based systems handle alternatives, and I myself also use something similar to write this E-Mail (but it is not a symlink, it is a Git alias). But that's not the hack that I was talking about. The hack I meant was: if you symlink `gzip` to `pigz` in your `PATH` *and then expect `git archive --format=tgz` to pick that up*. As far as I am concerned, the fact that `git archive --format=tgz` spawns `gzip` to perform the compression is an implementation detail, and not something that users should feel they can rely on. > > If you want a different compressor than the default (which can > > change), you should specify it specifically. > > You might want to do so system-wide, or for each program at a time. > > I don't care about this for gzip myself, just pointing out it *is* a > thing people use. Sure. > >> I'm fine overall with the idea of bringing the compression into the > >> binary using zlib, provided that we preserve the "-n" behavior > >> (producing reproducible archives). > > > > Thanks for voicing this concern. I had a look at zlib's source code, > > and it looks like it requires an extra function call (that we don't > > call) to make the resulting file non-reproducible. In other words, it > > has the opposite default behavior from `gzip`. > > Just commenting on the overall thread: I like René's "new built-in" > patch best. I guess we now have to diverging votes: yours for the `git archive --gzip` "built-in" and Peff's for the async code ;-) > You mentioned "new command that we have to support for eternity". I > think calling it "git gzip" is a bad idea. We'd make it "git > archive--gzip" or "git archive--helper", and we could hide building it > behind some compat flag. > > Then we'd carry no if/else internal/external code, and the portability > issue that started this would be addressed, no? Sure. The async version would leave the door wide open for implementing pigz' trick to multi-thread the compression, though. > As a bonus we could also drop the "GZIP" prereq from the test suite > entirely and just put that "gzip" in $PATH for the purposes of the > tests. > > I spied on your yet-to-be-submitted patches and you could drop GZIP from > the "git archive" tests, but we'd still need it in > t/t5562-http-backend-content-length.sh, but not if we had a "gzip" > compat helper. We need it at least once for *decompressing* the `--format=tgz` output in order to compare it to the `--format=tar` output. Besides, I think it is really important to keep the test that verifies that the output is correct (i.e. that gzip can decompress it). > There's also a long-standing bug/misfeature in git-archive that I wonder > about: When you combine --format with --remote you can only generate > e.g. tar.gz if the remote is OK with it, if it says no you can't even if > it supports "tar" and you could do the "gz" part locally. Would such a > patch be harder with :zlib than if we always just spewed out to external > "gzip" after satisfying some criteria? I think it would be precisely the same: you'd still use the same "filter" code path. Ciao, Dscho