From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To: "René Scharfe" <l.s.r@web.de>
Cc: git@vger.kernel.org, "Junio C Hamano" <gitster@pobox.com>,
"Rohit Ashiwal" <rohit.ashiwal265@gmail.com>,
"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
"Jeff King" <peff@peff.net>,
"brian m . carlson" <sandals@crustytoothpaste.net>
Subject: Re: [PATCH v3 0/5] Avoid spawning gzip in git archive
Date: Fri, 1 Jul 2022 18:05:59 +0200 (CEST) [thread overview]
Message-ID: <038r075o-5s5r-9sop-5o02-8s84428o0r54@tzk.qr> (raw)
In-Reply-To: <ps52p06s-01nr-4ss2-r802-6nsp5nqq5199@tzk.qr>
[-- Attachment #1: Type: text/plain, Size: 3490 bytes --]
Me again,
On Thu, 30 Jun 2022, Johannes Schindelin wrote:
> I finally managed to play around with building and packaging zlib-ng
> [*1*] (since I want to use it as a drop-in replacement for zlib, I think
> it is best to configure it with `--zlib-compat`, that way I do not have
> to fiddle with any equivalent of `LD_PRELOAD`). Here are my numbers:
>
> zlib-ng: 14.409 s ± 0.209 s
> zlib: 26.843 s ± 0.636 s
>
> These are pretty good, which made me think that they might actually even
> help regular Git operations (because we zlib every loose object).
>
> So I tried to `fast-import` some 2500 commits from linux.git into a fresh
> repository, and the zlib-ng version takes ~51s and the zlib version takes
> ~58s. At first I thought that it might be noise, but the trend seems to be
> steady. It's not a huge improvement, of course, but I think that might be
> because most of the time is spent parsing.
>
> I then tried to test the performance focusing on writing loose object, by
> using p0008 (increasing the number of files from 50 to 1500 and
> restricting it to fsyncMethod=none).
>
> Unfortunately, the numbers are not really conclusive. I do see minor
> speed-ups with zlib-ng, mostly, in the single digit percentages, though
> occasionally in the other direction. In other words, there is no clear-cut
> change, just a vague tendency. My guess: Git writes too small files (their
> contents are of the form "$basedir$test_tick.$counter") and zlib-ng's
> superior performance does not come to bear.
>
> Still, for larger workloads, zlib-ng seems to offer a quite nice and
> substantial performance improvement over zlib.
Stolee pointed out to me that objects inside pack files are also
zlib-compressed, and that measuring the speed of `git rev-list --objects
--all --count` might therefore be a better test.
And this is where things get a little messy: in the context of Git for
Windows, my local measurements indicate that zlib is better, with ~41
seconds using zlib vs ~52 seconds using zlib-ng (but the latter has a
rather large variance).
These measurements were done with a relatively straight-forward build of
zlib-ng v2.0.6, and on a hunch I then tried to build the tip of zlib-ng's
`develop` branch (which was much less straight-forward) and now get
virtually the same speed with that `rev-list` command.
But then I repeated the `archive` measurement with the `develop` version
of zlib-ng, and while it was still substantially faster than zlib, it was
slightly slower than zlib-ng v2.0.6 (zlib: ~26 seconds, zlib-ng v2.0.6:
~14 seconds, zlib-ng develop: ~16 seconds). Still, much, much faster than
using `-c tar.tgz.command="gzip -cn"` at ~24 seconds.
So: the picture is messy. The latest official release of zlib-ng seems to
offer performance wins using `archive` but slight losses using `rev-list.
Upgrading to the latest revision of zlib-ng offers slightly smaller
performance wins using `archive` and equivalent performance using
`rev-list`. Both blow `gzip -cn` out of the water, thanks to using MMX or
whatever my laptop's CPU offers.
The take-away as far as Git for Windows is concerned: It seems not _quite_
the time yet to switch from zlib to zlib-ng, I want to wait until there is
an official zlib-ng release with favorable speed.
Ciao,
Dscho
P.S.: I pushed a WIP update to this branch:
> Footnote *1*: https://github.com/msys2/MINGW-packages/compare/master...dscho:zlib-ng
next prev parent reply other threads:[~2022-07-01 16:10 UTC|newest]
Thread overview: 74+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-04-12 23:04 [PATCH 0/2] Avoid spawning gzip in git archive Johannes Schindelin via GitGitGadget
2019-04-12 23:04 ` [PATCH 1/2] archive: replace write_or_die() calls with write_block_or_die() Rohit Ashiwal via GitGitGadget
2019-04-13 1:34 ` Jeff King
2019-04-13 5:51 ` Junio C Hamano
2019-04-14 4:36 ` Rohit Ashiwal
2019-04-26 14:29 ` Johannes Schindelin
2019-04-26 23:44 ` Junio C Hamano
2019-04-29 21:32 ` Johannes Schindelin
2019-05-01 18:09 ` Jeff King
2019-05-02 20:29 ` René Scharfe
2019-05-05 5:25 ` Junio C Hamano
2019-05-06 5:07 ` Jeff King
2019-04-14 4:34 ` Rohit Ashiwal
2019-04-14 10:33 ` Junio C Hamano
2019-04-26 14:28 ` Johannes Schindelin
2019-05-01 18:07 ` Jeff King
2019-04-12 23:04 ` [PATCH 2/2] archive: avoid spawning `gzip` Rohit Ashiwal via GitGitGadget
2019-04-13 1:51 ` Jeff King
2019-04-13 22:01 ` René Scharfe
2019-04-15 21:35 ` Jeff King
2019-04-26 14:51 ` Johannes Schindelin
2019-04-27 9:59 ` René Scharfe
2019-04-27 17:39 ` René Scharfe
2019-04-29 21:25 ` Johannes Schindelin
2019-05-01 17:45 ` René Scharfe
2019-05-01 18:18 ` Jeff King
2019-06-10 10:44 ` René Scharfe
2019-06-13 19:16 ` Jeff King
2019-04-13 22:16 ` brian m. carlson
2019-04-15 21:36 ` Jeff King
2019-04-26 14:54 ` Johannes Schindelin
2019-05-02 20:20 ` Ævar Arnfjörð Bjarmason
2019-05-03 20:49 ` Johannes Schindelin
2019-05-03 20:52 ` Jeff King
2019-04-26 14:47 ` Johannes Schindelin
[not found] ` <pull.145.v2.git.gitgitgadget@gmail.com>
[not found] ` <4ea94a8784876c3a19e387537edd81a957fc692c.1556321244.git.gitgitgadget@gmail.com>
2019-05-02 20:29 ` [PATCH v2 3/4] archive: optionally use zlib directly for gzip compression René Scharfe
[not found] ` <ac2b2488a1b42b3caf8a84594c48eca796748e59.1556321244.git.gitgitgadget@gmail.com>
2019-05-02 20:30 ` [PATCH v2 2/4] archive-tar: mark RECORDSIZE/BLOCKSIZE as unsigned René Scharfe
2019-05-08 11:45 ` Johannes Schindelin
2019-05-08 23:04 ` Jeff King
2019-05-09 14:06 ` Johannes Schindelin
2019-05-09 18:38 ` Jeff King
2019-05-10 17:18 ` René Scharfe
2019-05-10 21:20 ` Jeff King
2022-06-12 6:00 ` [PATCH v3 0/5] Avoid spawning gzip in git archive René Scharfe
2022-06-12 6:03 ` [PATCH v3 1/5] archive: rename archiver data field to filter_command René Scharfe
2022-06-12 6:05 ` [PATCH v3 2/5] archive-tar: factor out write_block() René Scharfe
2022-06-12 6:08 ` [PATCH v3 3/5] archive-tar: add internal gzip implementation René Scharfe
2022-06-13 19:10 ` Junio C Hamano
2022-06-12 6:18 ` [PATCH v3 4/5] archive-tar: use OS_CODE 3 (Unix) for internal gzip René Scharfe
2022-06-12 6:19 ` [PATCH v3 5/5] archive-tar: use internal gzip by default René Scharfe
2022-06-13 21:55 ` Junio C Hamano
2022-06-14 11:27 ` Johannes Schindelin
2022-06-14 15:47 ` René Scharfe
2022-06-14 15:56 ` René Scharfe
2022-06-14 16:29 ` Johannes Schindelin
2022-06-14 20:04 ` René Scharfe
2022-06-15 16:41 ` Junio C Hamano
2022-06-14 11:28 ` [PATCH v3 0/5] Avoid spawning gzip in git archive Johannes Schindelin
2022-06-14 20:05 ` René Scharfe
2022-06-30 18:55 ` Johannes Schindelin
2022-07-01 16:05 ` Johannes Schindelin [this message]
2022-07-01 16:27 ` Jeff King
2022-07-01 17:47 ` Junio C Hamano
2022-06-15 16:53 ` [PATCH v4 0/6] " René Scharfe
2022-06-15 16:58 ` [PATCH v4 1/6] archive: update format documentation René Scharfe
2022-06-15 16:59 ` [PATCH v4 2/6] archive: rename archiver data field to filter_command René Scharfe
2022-06-15 17:01 ` [PATCH v4 3/6] archive-tar: factor out write_block() René Scharfe
2022-06-15 17:02 ` [PATCH v4 4/6] archive-tar: add internal gzip implementation René Scharfe
2022-06-15 20:32 ` Ævar Arnfjörð Bjarmason
2022-06-16 18:55 ` René Scharfe
2022-06-24 11:13 ` Ævar Arnfjörð Bjarmason
2022-06-24 20:24 ` René Scharfe
2022-06-15 17:04 ` [PATCH v4 5/6] archive-tar: use OS_CODE 3 (Unix) for internal gzip René Scharfe
2022-06-15 17:05 ` [PATCH v4 6/6] archive-tar: use internal gzip by default René Scharfe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=038r075o-5s5r-9sop-5o02-8s84428o0r54@tzk.qr \
--to=johannes.schindelin@gmx.de \
--cc=avarab@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=l.s.r@web.de \
--cc=peff@peff.net \
--cc=rohit.ashiwal265@gmail.com \
--cc=sandals@crustytoothpaste.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).