git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: "brian m. carlson" <sandals@crustytoothpaste.net>,
	Jeff King <peff@peff.net>,
	Rohit Ashiwal via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
	Rohit Ashiwal <rohit.ashiwal265@gmail.com>
Subject: Re: [PATCH 2/2] archive: avoid spawning `gzip`
Date: Fri, 3 May 2019 22:49:17 +0200 (DST)	[thread overview]
Message-ID: <nycvar.QRO.7.76.6.1905031437590.45@tvgsbejvaqbjf.bet> (raw)
In-Reply-To: <87ftpwip5m.fsf@evledraar.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 5284 bytes --]

Hi Ævar,

On Thu, 2 May 2019, Ævar Arnfjörð Bjarmason wrote:

> On Fri, Apr 26 2019, Johannes Schindelin wrote:
>
> > On Sat, 13 Apr 2019, brian m. carlson wrote:
> >
> >> On Fri, Apr 12, 2019 at 09:51:02PM -0400, Jeff King wrote:
> >> > I wondered how you were going to kick this in, since users can
> >> > define arbitrary filters. I think it's kind of neat to
> >> > automagically convert "gzip -cn" (which also happens to be the
> >> > default). But I think we should mention that in the Documentation,
> >> > in case somebody tries to use a custom version of gzip and wonders
> >> > why it isn't kicking in.
> >> >
> >> > Likewise, it might make sense in the tests to put a poison gzip in
> >> > the $PATH so that we can be sure we're using our internal code, and
> >> > not just calling out to gzip (on platforms that have it, of
> >> > course).
> >> >
> >> > The alternative is that we could use a special token like ":zlib"
> >> > or something to indicate that the internal implementation should be
> >> > used (and then tweak the baked-in default, too). That might be less
> >> > surprising for users, but most people would still get the benefit
> >> > since they'd be using the default config.
> >>
> >> I agree that a special value (or NULL, if that's possible) would be
> >> nicer here. That way, if someone does specify a custom gzip, we honor
> >> it, and it serves to document the code better. For example, if
> >> someone symlinked pigz to gzip and used "gzip -cn", then they might
> >> not get the parallelization benefits they expected.
> >
> > I went with `:zlib`. The `NULL` value would not really work, as there
> > is no way to specify that via `archive.tgz.command`.
> >
> > About the symlinked thing: I do not really want to care to support
> > such hacks.
>
> It's the standard way by which a lot of systems do this, e.g. on my
> Debian box:
>
>     $ find /{,s}bin /usr/{,s}bin -type l -exec file {} \;|grep /etc/alternatives|wc -l
>     108
>
> To write this E-Mail I'm invoking one such symlink :)

I am well aware of the way Debian-based systems handle alternatives, and I
myself also use something similar to write this E-Mail (but it is not a
symlink, it is a Git alias).

But that's not the hack that I was talking about.

The hack I meant was: if you symlink `gzip` to `pigz` in your `PATH` *and
then expect `git archive --format=tgz` to pick that up*.

As far as I am concerned, the fact that `git archive --format=tgz` spawns
`gzip` to perform the compression is an implementation detail, and not
something that users should feel they can rely on.

> > If you want a different compressor than the default (which can
> > change), you should specify it specifically.
>
> You might want to do so system-wide, or for each program at a time.
>
> I don't care about this for gzip myself, just pointing out it *is* a
> thing people use.

Sure.

> >> I'm fine overall with the idea of bringing the compression into the
> >> binary using zlib, provided that we preserve the "-n" behavior
> >> (producing reproducible archives).
> >
> > Thanks for voicing this concern. I had a look at zlib's source code,
> > and it looks like it requires an extra function call (that we don't
> > call) to make the resulting file non-reproducible. In other words, it
> > has the opposite default behavior from `gzip`.
>
> Just commenting on the overall thread: I like René's "new built-in"
> patch best.

I guess we now have to diverging votes: yours for the `git archive --gzip`
"built-in" and Peff's for the async code ;-)

> You mentioned "new command that we have to support for eternity". I
> think calling it "git gzip" is a bad idea. We'd make it "git
> archive--gzip" or "git archive--helper", and we could hide building it
> behind some compat flag.
>
> Then we'd carry no if/else internal/external code, and the portability
> issue that started this would be addressed, no?

Sure.

The async version would leave the door wide open for implementing pigz'
trick to multi-thread the compression, though.

> As a bonus we could also drop the "GZIP" prereq from the test suite
> entirely and just put that "gzip" in $PATH for the purposes of the
> tests.
>
> I spied on your yet-to-be-submitted patches and you could drop GZIP from
> the "git archive" tests, but we'd still need it in
> t/t5562-http-backend-content-length.sh, but not if we had a "gzip"
> compat helper.

We need it at least once for *decompressing* the `--format=tgz` output in
order to compare it to the `--format=tar` output. Besides, I think it is
really important to keep the test that verifies that the output is correct
(i.e. that gzip can decompress it).

> There's also a long-standing bug/misfeature in git-archive that I wonder
> about: When you combine --format with --remote you can only generate
> e.g. tar.gz if the remote is OK with it, if it says no you can't even if
> it supports "tar" and you could do the "gz" part locally. Would such a
> patch be harder with :zlib than if we always just spewed out to external
> "gzip" after satisfying some criteria?

I think it would be precisely the same: you'd still use the same "filter"
code path.

Ciao,
Dscho

  reply	other threads:[~2019-05-03 20:49 UTC|newest]

Thread overview: 74+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-12 23:04 [PATCH 0/2] Avoid spawning gzip in git archive Johannes Schindelin via GitGitGadget
2019-04-12 23:04 ` [PATCH 1/2] archive: replace write_or_die() calls with write_block_or_die() Rohit Ashiwal via GitGitGadget
2019-04-13  1:34   ` Jeff King
2019-04-13  5:51     ` Junio C Hamano
2019-04-14  4:36       ` Rohit Ashiwal
2019-04-26 14:29       ` Johannes Schindelin
2019-04-26 23:44         ` Junio C Hamano
2019-04-29 21:32           ` Johannes Schindelin
2019-05-01 18:09             ` Jeff King
2019-05-02 20:29               ` René Scharfe
2019-05-05  5:25               ` Junio C Hamano
2019-05-06  5:07                 ` Jeff King
2019-04-14  4:34     ` Rohit Ashiwal
2019-04-14 10:33       ` Junio C Hamano
2019-04-26 14:28     ` Johannes Schindelin
2019-05-01 18:07       ` Jeff King
2019-04-12 23:04 ` [PATCH 2/2] archive: avoid spawning `gzip` Rohit Ashiwal via GitGitGadget
2019-04-13  1:51   ` Jeff King
2019-04-13 22:01     ` René Scharfe
2019-04-15 21:35       ` Jeff King
2019-04-26 14:51         ` Johannes Schindelin
2019-04-27  9:59           ` René Scharfe
2019-04-27 17:39             ` René Scharfe
2019-04-29 21:25               ` Johannes Schindelin
2019-05-01 17:45                 ` René Scharfe
2019-05-01 18:18                   ` Jeff King
2019-06-10 10:44                     ` René Scharfe
2019-06-13 19:16                       ` Jeff King
2019-04-13 22:16     ` brian m. carlson
2019-04-15 21:36       ` Jeff King
2019-04-26 14:54       ` Johannes Schindelin
2019-05-02 20:20         ` Ævar Arnfjörð Bjarmason
2019-05-03 20:49           ` Johannes Schindelin [this message]
2019-05-03 20:52             ` Jeff King
2019-04-26 14:47     ` Johannes Schindelin
     [not found] ` <pull.145.v2.git.gitgitgadget@gmail.com>
     [not found]   ` <4ea94a8784876c3a19e387537edd81a957fc692c.1556321244.git.gitgitgadget@gmail.com>
2019-05-02 20:29     ` [PATCH v2 3/4] archive: optionally use zlib directly for gzip compression René Scharfe
     [not found]   ` <ac2b2488a1b42b3caf8a84594c48eca796748e59.1556321244.git.gitgitgadget@gmail.com>
2019-05-02 20:30     ` [PATCH v2 2/4] archive-tar: mark RECORDSIZE/BLOCKSIZE as unsigned René Scharfe
2019-05-08 11:45       ` Johannes Schindelin
2019-05-08 23:04         ` Jeff King
2019-05-09 14:06           ` Johannes Schindelin
2019-05-09 18:38             ` Jeff King
2019-05-10 17:18               ` René Scharfe
2019-05-10 21:20                 ` Jeff King
2022-06-12  6:00 ` [PATCH v3 0/5] Avoid spawning gzip in git archive René Scharfe
2022-06-12  6:03   ` [PATCH v3 1/5] archive: rename archiver data field to filter_command René Scharfe
2022-06-12  6:05   ` [PATCH v3 2/5] archive-tar: factor out write_block() René Scharfe
2022-06-12  6:08   ` [PATCH v3 3/5] archive-tar: add internal gzip implementation René Scharfe
2022-06-13 19:10     ` Junio C Hamano
2022-06-12  6:18   ` [PATCH v3 4/5] archive-tar: use OS_CODE 3 (Unix) for internal gzip René Scharfe
2022-06-12  6:19   ` [PATCH v3 5/5] archive-tar: use internal gzip by default René Scharfe
2022-06-13 21:55     ` Junio C Hamano
2022-06-14 11:27       ` Johannes Schindelin
2022-06-14 15:47         ` René Scharfe
2022-06-14 15:56           ` René Scharfe
2022-06-14 16:29           ` Johannes Schindelin
2022-06-14 20:04             ` René Scharfe
2022-06-15 16:41               ` Junio C Hamano
2022-06-14 11:28   ` [PATCH v3 0/5] Avoid spawning gzip in git archive Johannes Schindelin
2022-06-14 20:05     ` René Scharfe
2022-06-30 18:55       ` Johannes Schindelin
2022-07-01 16:05         ` Johannes Schindelin
2022-07-01 16:27           ` Jeff King
2022-07-01 17:47             ` Junio C Hamano
2022-06-15 16:53 ` [PATCH v4 0/6] " René Scharfe
2022-06-15 16:58   ` [PATCH v4 1/6] archive: update format documentation René Scharfe
2022-06-15 16:59   ` [PATCH v4 2/6] archive: rename archiver data field to filter_command René Scharfe
2022-06-15 17:01   ` [PATCH v4 3/6] archive-tar: factor out write_block() René Scharfe
2022-06-15 17:02   ` [PATCH v4 4/6] archive-tar: add internal gzip implementation René Scharfe
2022-06-15 20:32     ` Ævar Arnfjörð Bjarmason
2022-06-16 18:55       ` René Scharfe
2022-06-24 11:13         ` Ævar Arnfjörð Bjarmason
2022-06-24 20:24           ` René Scharfe
2022-06-15 17:04   ` [PATCH v4 5/6] archive-tar: use OS_CODE 3 (Unix) for internal gzip René Scharfe
2022-06-15 17:05   ` [PATCH v4 6/6] archive-tar: use internal gzip by default René Scharfe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=nycvar.QRO.7.76.6.1905031437590.45@tvgsbejvaqbjf.bet \
    --to=johannes.schindelin@gmx.de \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=rohit.ashiwal265@gmail.com \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).