git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To: Jeff King <peff@peff.net>
Cc: Rohit Ashiwal via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
	Rohit Ashiwal <rohit.ashiwal265@gmail.com>
Subject: Re: [PATCH 2/2] archive: avoid spawning `gzip`
Date: Fri, 26 Apr 2019 10:47:04 -0400 (DST)	[thread overview]
Message-ID: <nycvar.QRO.7.76.6.1904261035090.45@tvgsbejvaqbjf.bet> (raw)
In-Reply-To: <20190413015102.GC2040@sigill.intra.peff.net>

[-- Attachment #1: Type: text/plain, Size: 4555 bytes --]

Hi Peff,

On Fri, 12 Apr 2019, Jeff King wrote:

> On Fri, Apr 12, 2019 at 04:04:40PM -0700, Rohit Ashiwal via GitGitGadget wrote:
>
> > From: Rohit Ashiwal <rohit.ashiwal265@gmail.com>
> >
> > As we already link to the zlib library, we can perform the compression
> > without even requiring gzip on the host machine.
>
> Very cool. It's nice to drop a dependency, and this should be a bit more
> efficient, too.

Sadly, no, as René intuited and as your testing shows: there seems to be a
~15% penalty for compressing in the same thread as producing the data to
be compressed.

Given that it reduces the number of dependencies, and given that it might
be better to rely on the external command `pigz -cn` if speed is really a
matter, I still think it makes sense to switch the default, though.

> > diff --git a/archive-tar.c b/archive-tar.c
> > index ba37dad27c..5979ed14b7 100644
> > --- a/archive-tar.c
> > +++ b/archive-tar.c
> > @@ -466,18 +466,34 @@ static int write_tar_filter_archive(const struct archiver *ar,
> >  	filter.use_shell = 1;
> >  	filter.in = -1;
> >
> > -	if (start_command(&filter) < 0)
> > -		die_errno(_("unable to start '%s' filter"), argv[0]);
> > -	close(1);
> > -	if (dup2(filter.in, 1) < 0)
> > -		die_errno(_("unable to redirect descriptor"));
> > -	close(filter.in);
> > +	if (!strcmp("gzip -cn", ar->data)) {
>
> I wondered how you were going to kick this in, since users can define
> arbitrary filters. I think it's kind of neat to automagically convert
> "gzip -cn" (which also happens to be the default). But I think we should
> mention that in the Documentation, in case somebody tries to use a
> custom version of gzip and wonders why it isn't kicking in.
>
> Likewise, it might make sense in the tests to put a poison gzip in the
> $PATH so that we can be sure we're using our internal code, and not just
> calling out to gzip (on platforms that have it, of course).
>
> The alternative is that we could use a special token like ":zlib" or
> something to indicate that the internal implementation should be used
> (and then tweak the baked-in default, too). That might be less
> surprising for users, but most people would still get the benefit since
> they'd be using the default config.

I went with `:zlib`.

> > +		char outmode[4] = "wb\0";
>
> This looks sufficiently magical that it might merit a comment. I had to
> look in the zlib header file to learn that this is just a normal
> stdio-style mode. But we can't just do:
>
>   gzip = gzdopen(fd, "wb");
>
> because we want to (maybe) append a compression level. It's also
> slightly confusing that it explicitly includes a NUL, but later:
>
> > +		if (args->compression_level >= 0 && args->compression_level <= 9)
> > +			outmode[2] = '0' + args->compression_level;
>
> we may overwrite that and assume that outmode[3] is also a NUL. Which it
> is, because of how C initialization works. But that means we also do not
> need the "\0" in the initializer.
>
> Dropping that may make it slightly less jarring (any time I see a
> backslash escape in an initializer, I assume I'm in for some binary
> trickery, but this turns out to be much more mundane).
>
> I'd also consider just using a strbuf:
>
>   struct strbuf outmode = STRBUF_INIT;
>
>   strbuf_addstr(&outmode, "wb");
>   if (args->compression_level >= 0 && args->compression_level <= 9)
> 	strbuf_addch(&outmode, '0' + args->compression_level);
>
> That's overkill in a sense, but it saves us having to deal with
> manually-counted offsets, and this code is only run once per program
> invocation, so the efficiency shouldn't matter.

I'll change that, too, as it seems that `pigz` allows compression levels
higher than 9, in which case we need `strbuf_addf()` anyway. I will not
adjust the condition `<= 9`, of course, as zlib is still limited that way.

> > +		gzip = gzdopen(fileno(stdout), outmode);
> > +		if (!gzip)
> > +			die(_("Could not gzdopen stdout"));
>
> Is there a way to get a more specific error from zlib? I'm less
> concerned about gzdopen here (which should never fail), and more about
> the writing and closing steps. I don't see anything good for gzdopen(),
> but...

Sadly, I did not find anything there.

> > +	if (gzip) {
> > +		if (gzclose(gzip) != Z_OK)
> > +			die(_("gzclose failed"));
>
> ...according to zlib.h, here the returned int is meaningful. And if
> Z_ERRNO, we should probably use die_errno() to give a better message.

Okay.

Thanks,
Dscho

  parent reply	other threads:[~2019-04-26 14:47 UTC|newest]

Thread overview: 74+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-12 23:04 [PATCH 0/2] Avoid spawning gzip in git archive Johannes Schindelin via GitGitGadget
2019-04-12 23:04 ` [PATCH 1/2] archive: replace write_or_die() calls with write_block_or_die() Rohit Ashiwal via GitGitGadget
2019-04-13  1:34   ` Jeff King
2019-04-13  5:51     ` Junio C Hamano
2019-04-14  4:36       ` Rohit Ashiwal
2019-04-26 14:29       ` Johannes Schindelin
2019-04-26 23:44         ` Junio C Hamano
2019-04-29 21:32           ` Johannes Schindelin
2019-05-01 18:09             ` Jeff King
2019-05-02 20:29               ` René Scharfe
2019-05-05  5:25               ` Junio C Hamano
2019-05-06  5:07                 ` Jeff King
2019-04-14  4:34     ` Rohit Ashiwal
2019-04-14 10:33       ` Junio C Hamano
2019-04-26 14:28     ` Johannes Schindelin
2019-05-01 18:07       ` Jeff King
2019-04-12 23:04 ` [PATCH 2/2] archive: avoid spawning `gzip` Rohit Ashiwal via GitGitGadget
2019-04-13  1:51   ` Jeff King
2019-04-13 22:01     ` René Scharfe
2019-04-15 21:35       ` Jeff King
2019-04-26 14:51         ` Johannes Schindelin
2019-04-27  9:59           ` René Scharfe
2019-04-27 17:39             ` René Scharfe
2019-04-29 21:25               ` Johannes Schindelin
2019-05-01 17:45                 ` René Scharfe
2019-05-01 18:18                   ` Jeff King
2019-06-10 10:44                     ` René Scharfe
2019-06-13 19:16                       ` Jeff King
2019-04-13 22:16     ` brian m. carlson
2019-04-15 21:36       ` Jeff King
2019-04-26 14:54       ` Johannes Schindelin
2019-05-02 20:20         ` Ævar Arnfjörð Bjarmason
2019-05-03 20:49           ` Johannes Schindelin
2019-05-03 20:52             ` Jeff King
2019-04-26 14:47     ` Johannes Schindelin [this message]
     [not found] ` <pull.145.v2.git.gitgitgadget@gmail.com>
     [not found]   ` <4ea94a8784876c3a19e387537edd81a957fc692c.1556321244.git.gitgitgadget@gmail.com>
2019-05-02 20:29     ` [PATCH v2 3/4] archive: optionally use zlib directly for gzip compression René Scharfe
     [not found]   ` <ac2b2488a1b42b3caf8a84594c48eca796748e59.1556321244.git.gitgitgadget@gmail.com>
2019-05-02 20:30     ` [PATCH v2 2/4] archive-tar: mark RECORDSIZE/BLOCKSIZE as unsigned René Scharfe
2019-05-08 11:45       ` Johannes Schindelin
2019-05-08 23:04         ` Jeff King
2019-05-09 14:06           ` Johannes Schindelin
2019-05-09 18:38             ` Jeff King
2019-05-10 17:18               ` René Scharfe
2019-05-10 21:20                 ` Jeff King
2022-06-12  6:00 ` [PATCH v3 0/5] Avoid spawning gzip in git archive René Scharfe
2022-06-12  6:03   ` [PATCH v3 1/5] archive: rename archiver data field to filter_command René Scharfe
2022-06-12  6:05   ` [PATCH v3 2/5] archive-tar: factor out write_block() René Scharfe
2022-06-12  6:08   ` [PATCH v3 3/5] archive-tar: add internal gzip implementation René Scharfe
2022-06-13 19:10     ` Junio C Hamano
2022-06-12  6:18   ` [PATCH v3 4/5] archive-tar: use OS_CODE 3 (Unix) for internal gzip René Scharfe
2022-06-12  6:19   ` [PATCH v3 5/5] archive-tar: use internal gzip by default René Scharfe
2022-06-13 21:55     ` Junio C Hamano
2022-06-14 11:27       ` Johannes Schindelin
2022-06-14 15:47         ` René Scharfe
2022-06-14 15:56           ` René Scharfe
2022-06-14 16:29           ` Johannes Schindelin
2022-06-14 20:04             ` René Scharfe
2022-06-15 16:41               ` Junio C Hamano
2022-06-14 11:28   ` [PATCH v3 0/5] Avoid spawning gzip in git archive Johannes Schindelin
2022-06-14 20:05     ` René Scharfe
2022-06-30 18:55       ` Johannes Schindelin
2022-07-01 16:05         ` Johannes Schindelin
2022-07-01 16:27           ` Jeff King
2022-07-01 17:47             ` Junio C Hamano
2022-06-15 16:53 ` [PATCH v4 0/6] " René Scharfe
2022-06-15 16:58   ` [PATCH v4 1/6] archive: update format documentation René Scharfe
2022-06-15 16:59   ` [PATCH v4 2/6] archive: rename archiver data field to filter_command René Scharfe
2022-06-15 17:01   ` [PATCH v4 3/6] archive-tar: factor out write_block() René Scharfe
2022-06-15 17:02   ` [PATCH v4 4/6] archive-tar: add internal gzip implementation René Scharfe
2022-06-15 20:32     ` Ævar Arnfjörð Bjarmason
2022-06-16 18:55       ` René Scharfe
2022-06-24 11:13         ` Ævar Arnfjörð Bjarmason
2022-06-24 20:24           ` René Scharfe
2022-06-15 17:04   ` [PATCH v4 5/6] archive-tar: use OS_CODE 3 (Unix) for internal gzip René Scharfe
2022-06-15 17:05   ` [PATCH v4 6/6] archive-tar: use internal gzip by default René Scharfe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=nycvar.QRO.7.76.6.1904261035090.45@tvgsbejvaqbjf.bet \
    --to=johannes.schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=rohit.ashiwal265@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).