From: Jeff King <peff@peff.net>
To: "René Scharfe" <l.s.r@web.de>
Cc: Johannes Schindelin <Johannes.Schindelin@gmx.de>,
Rohit Ashiwal via GitGitGadget <gitgitgadget@gmail.com>,
git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
Rohit Ashiwal <rohit.ashiwal265@gmail.com>
Subject: Re: [PATCH 2/2] archive: avoid spawning `gzip`
Date: Thu, 13 Jun 2019 15:16:30 -0400 [thread overview]
Message-ID: <20190613191630.GA30854@sigill.intra.peff.net> (raw)
In-Reply-To: <c00a062a-4f01-4754-3429-e7bb2a26aac1@web.de>
On Mon, Jun 10, 2019 at 12:44:54PM +0200, René Scharfe wrote:
> Am 01.05.19 um 20:18 schrieb Jeff King:
> > On Wed, May 01, 2019 at 07:45:05PM +0200, René Scharfe wrote:
> >
> >>> But since the performance is still not quite on par with `gzip`, I would
> >>> actually rather not, and really, just punt on that one, stating that
> >>> people interested in higher performance should use `pigz`.
> >>
> >> Here are my performance numbers for generating .tar.gz files again:
>
> OK, tried one more version, with pthreads (patch at the end). Also
> redid all measurements for better comparability; everything is faster
> now for some reason (perhaps due to a compiler update? clang version
> 7.0.1-8 now):
Hmm. Interesting that using pthreads is still slower than just shelling
out to gzip:
> master, using gzip(1):
> Benchmark #1: git archive --format=tgz HEAD
> Time (mean ± σ): 15.697 s ± 0.246 s [User: 19.213 s, System: 0.386 s]
> Range (min … max): 15.405 s … 16.103 s 10 runs
> [...]
> using zlib in a separate thread (that's the new one):
> Benchmark #1: git archive --format=tgz HEAD
> Time (mean ± σ): 16.310 s ± 0.237 s [User: 20.075 s, System: 0.173 s]
> Range (min … max): 15.983 s … 16.790 s 10 runs
I wonder if zlib is just slower. Or if the cost of context switching
is somehow higher than just dumping big chunks over a pipe. In
particular, our gzip-alike is still faster than pthreads:
> using a gzip-lookalike:
> Benchmark #1: git archive --format=tgz HEAD
> Time (mean ± σ): 16.289 s ± 0.218 s [User: 19.485 s, System: 0.337 s]
> Range (min … max): 16.020 s … 16.555 s 10 runs
though it looks like the timings do overlap.
> > At GitHub we certainly do cache the git-archive output. We'd also be
> > just fine with the sequential solution. We generally turn down
> > pack.threads to 1, and keep our CPUs busy by serving multiple users
> > anyway.
> >
> > So whatever has the lowest overall CPU time is generally preferable, but
> > the times are close enough that I don't think we'd care much either way
> > (and it's probably not worth having a config option or similar).
>
> Moving back to 2009 and reducing the number of utilized cores both feels
> weird, but the sequential solution *is* the most obvious, easiest and
> (by a narrow margin) lightest one if gzip(1) is not an option anymore.
It sounds like we resolved to give the "internal gzip" its own name
(whether it's a gzip-alike command, or a special name we recognize to
trigger the internal code). So maybe we could continue to default to
"gzip -cn", but platforms could do otherwise when shipping gzip there is
a pain (i.e. Windows, but maybe also anybody else who wants to set
NO_EXTERNAL_GZIP or detect it from autoconf).
-Peff
next prev parent reply other threads:[~2019-06-13 19:16 UTC|newest]
Thread overview: 74+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-04-12 23:04 [PATCH 0/2] Avoid spawning gzip in git archive Johannes Schindelin via GitGitGadget
2019-04-12 23:04 ` [PATCH 1/2] archive: replace write_or_die() calls with write_block_or_die() Rohit Ashiwal via GitGitGadget
2019-04-13 1:34 ` Jeff King
2019-04-13 5:51 ` Junio C Hamano
2019-04-14 4:36 ` Rohit Ashiwal
2019-04-26 14:29 ` Johannes Schindelin
2019-04-26 23:44 ` Junio C Hamano
2019-04-29 21:32 ` Johannes Schindelin
2019-05-01 18:09 ` Jeff King
2019-05-02 20:29 ` René Scharfe
2019-05-05 5:25 ` Junio C Hamano
2019-05-06 5:07 ` Jeff King
2019-04-14 4:34 ` Rohit Ashiwal
2019-04-14 10:33 ` Junio C Hamano
2019-04-26 14:28 ` Johannes Schindelin
2019-05-01 18:07 ` Jeff King
2019-04-12 23:04 ` [PATCH 2/2] archive: avoid spawning `gzip` Rohit Ashiwal via GitGitGadget
2019-04-13 1:51 ` Jeff King
2019-04-13 22:01 ` René Scharfe
2019-04-15 21:35 ` Jeff King
2019-04-26 14:51 ` Johannes Schindelin
2019-04-27 9:59 ` René Scharfe
2019-04-27 17:39 ` René Scharfe
2019-04-29 21:25 ` Johannes Schindelin
2019-05-01 17:45 ` René Scharfe
2019-05-01 18:18 ` Jeff King
2019-06-10 10:44 ` René Scharfe
2019-06-13 19:16 ` Jeff King [this message]
2019-04-13 22:16 ` brian m. carlson
2019-04-15 21:36 ` Jeff King
2019-04-26 14:54 ` Johannes Schindelin
2019-05-02 20:20 ` Ævar Arnfjörð Bjarmason
2019-05-03 20:49 ` Johannes Schindelin
2019-05-03 20:52 ` Jeff King
2019-04-26 14:47 ` Johannes Schindelin
[not found] ` <pull.145.v2.git.gitgitgadget@gmail.com>
[not found] ` <4ea94a8784876c3a19e387537edd81a957fc692c.1556321244.git.gitgitgadget@gmail.com>
2019-05-02 20:29 ` [PATCH v2 3/4] archive: optionally use zlib directly for gzip compression René Scharfe
[not found] ` <ac2b2488a1b42b3caf8a84594c48eca796748e59.1556321244.git.gitgitgadget@gmail.com>
2019-05-02 20:30 ` [PATCH v2 2/4] archive-tar: mark RECORDSIZE/BLOCKSIZE as unsigned René Scharfe
2019-05-08 11:45 ` Johannes Schindelin
2019-05-08 23:04 ` Jeff King
2019-05-09 14:06 ` Johannes Schindelin
2019-05-09 18:38 ` Jeff King
2019-05-10 17:18 ` René Scharfe
2019-05-10 21:20 ` Jeff King
2022-06-12 6:00 ` [PATCH v3 0/5] Avoid spawning gzip in git archive René Scharfe
2022-06-12 6:03 ` [PATCH v3 1/5] archive: rename archiver data field to filter_command René Scharfe
2022-06-12 6:05 ` [PATCH v3 2/5] archive-tar: factor out write_block() René Scharfe
2022-06-12 6:08 ` [PATCH v3 3/5] archive-tar: add internal gzip implementation René Scharfe
2022-06-13 19:10 ` Junio C Hamano
2022-06-12 6:18 ` [PATCH v3 4/5] archive-tar: use OS_CODE 3 (Unix) for internal gzip René Scharfe
2022-06-12 6:19 ` [PATCH v3 5/5] archive-tar: use internal gzip by default René Scharfe
2022-06-13 21:55 ` Junio C Hamano
2022-06-14 11:27 ` Johannes Schindelin
2022-06-14 15:47 ` René Scharfe
2022-06-14 15:56 ` René Scharfe
2022-06-14 16:29 ` Johannes Schindelin
2022-06-14 20:04 ` René Scharfe
2022-06-15 16:41 ` Junio C Hamano
2022-06-14 11:28 ` [PATCH v3 0/5] Avoid spawning gzip in git archive Johannes Schindelin
2022-06-14 20:05 ` René Scharfe
2022-06-30 18:55 ` Johannes Schindelin
2022-07-01 16:05 ` Johannes Schindelin
2022-07-01 16:27 ` Jeff King
2022-07-01 17:47 ` Junio C Hamano
2022-06-15 16:53 ` [PATCH v4 0/6] " René Scharfe
2022-06-15 16:58 ` [PATCH v4 1/6] archive: update format documentation René Scharfe
2022-06-15 16:59 ` [PATCH v4 2/6] archive: rename archiver data field to filter_command René Scharfe
2022-06-15 17:01 ` [PATCH v4 3/6] archive-tar: factor out write_block() René Scharfe
2022-06-15 17:02 ` [PATCH v4 4/6] archive-tar: add internal gzip implementation René Scharfe
2022-06-15 20:32 ` Ævar Arnfjörð Bjarmason
2022-06-16 18:55 ` René Scharfe
2022-06-24 11:13 ` Ævar Arnfjörð Bjarmason
2022-06-24 20:24 ` René Scharfe
2022-06-15 17:04 ` [PATCH v4 5/6] archive-tar: use OS_CODE 3 (Unix) for internal gzip René Scharfe
2022-06-15 17:05 ` [PATCH v4 6/6] archive-tar: use internal gzip by default René Scharfe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190613191630.GA30854@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=Johannes.Schindelin@gmx.de \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=gitster@pobox.com \
--cc=l.s.r@web.de \
--cc=rohit.ashiwal265@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).