git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To: "René Scharfe" <l.s.r@web.de>
Cc: git@vger.kernel.org, "Junio C Hamano" <gitster@pobox.com>,
	"Rohit Ashiwal" <rohit.ashiwal265@gmail.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Jeff King" <peff@peff.net>,
	"brian m . carlson" <sandals@crustytoothpaste.net>
Subject: Re: [PATCH v3 0/5] Avoid spawning gzip in git archive
Date: Thu, 30 Jun 2022 20:55:38 +0200 (CEST)	[thread overview]
Message-ID: <ps52p06s-01nr-4ss2-r802-6nsp5nqq5199@tzk.qr> (raw)
In-Reply-To: <0aa5c101-06bf-325c-efbc-6b4ef38616c5@web.de>

[-- Attachment #1: Type: text/plain, Size: 3860 bytes --]

Hi René,

On Tue, 14 Jun 2022, René Scharfe wrote:

> Am 14.06.22 um 13:28 schrieb Johannes Schindelin:
> >
> > By the way, the main reason why I did not work more is that in
> > http://madler.net/pipermail/zlib-devel_madler.net/2019-December/003308.html,
> > Mark Adler (the zlib maintainer) announced that...
> >
> >> [...] There are many well-tested performance improvements in zlib
> >> waiting in the wings that will be incorporated over the next several
> >> months. [...]
> >
> > This was in December 2019. And now it's June 2022 and I kind of wonder
> > whether those promised improvements will still come.
> >
> > In the meantime, however, a viable alternative seems to have cropped up:
> > https://github.com/zlib-ng/zlib-ng. Essentially, it looks as if it is what
> > zlib should have become after above-quoted announcement.
> >
> > In particular the CPU intrinsics support (think MMX, SSE2/3, etc) seem to
> > be very interesting and I would not be completely surprised if building
> > Git with your patches and linking against zlib-ng would paint a very
> > favorable picture not only in terms of CPU time but also in terms of
> > wallclock time. Sadly, I have not been able to set aside time to look into
> > that angle, but maybe I can peak your interest?
> I was unable to preload zlib-ng using DYLD_INSERT_LIBRARIES on macOS
> 12.4 so far.  The included demo proggy looks impressive, though:
>
> $ hyperfine -w3 -L gzip gzip,../zlib-ng/minigzip "git -C ../linux archive --format=tar HEAD | {gzip} -c"
> Benchmark #1: git -C ../linux archive --format=tar HEAD | gzip -c
>   Time (mean ± σ):     20.424 s ±  0.006 s    [User: 23.964 s, System: 0.432 s]
>   Range (min … max):   20.414 s … 20.434 s    10 runs
>
> Benchmark #2: git -C ../linux archive --format=tar HEAD | ../zlib-ng/minigzip -c
>   Time (mean ± σ):     12.158 s ±  0.006 s    [User: 13.908 s, System: 0.376 s]
>   Range (min … max):   12.145 s … 12.166 s    10 runs
>
> Summary
>   'git -C ../linux archive --format=tar HEAD | ../zlib-ng/minigzip -c' ran
>     1.68 ± 0.00 times faster than 'git -C ../linux archive --format=tar HEAD | gzip -c'

Intriguing.

I finally managed to play around with building and packaging zlib-ng [*1*]
(since I want to use it as a drop-in replacement for zlib, I think it is
best to configure it with `--zlib-compat`, that way I do not have to
fiddle with any equivalent of `LD_PRELOAD`). Here are my numbers:

	zlib-ng: 14.409 s ± 0.209 s
	zlib:    26.843 s ± 0.636 s

These are pretty good, which made me think that they might actually even
help regular Git operations (because we zlib every loose object).

So I tried to `fast-import` some 2500 commits from linux.git into a fresh
repository, and the zlib-ng version takes ~51s and the zlib version takes
~58s. At first I thought that it might be noise, but the trend seems to be
steady. It's not a huge improvement, of course, but I think that might be
because most of the time is spent parsing.

I then tried to test the performance focusing on writing loose object, by
using p0008 (increasing the number of files from 50 to 1500 and
restricting it to fsyncMethod=none).

Unfortunately, the numbers are not really conclusive. I do see minor
speed-ups with zlib-ng, mostly, in the single digit percentages, though
occasionally in the other direction. In other words, there is no clear-cut
change, just a vague tendency. My guess: Git writes too small files (their
contents are of the form "$basedir$test_tick.$counter") and zlib-ng's
superior performance does not come to bear.

Still, for larger workloads, zlib-ng seems to offer a quite nice and
substantial performance improvement over zlib.

Ciao,
Dscho

Footnote *1*: https://github.com/msys2/MINGW-packages/compare/master...dscho:zlib-ng

  reply	other threads:[~2022-06-30 18:56 UTC|newest]

Thread overview: 74+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-12 23:04 [PATCH 0/2] Avoid spawning gzip in git archive Johannes Schindelin via GitGitGadget
2019-04-12 23:04 ` [PATCH 1/2] archive: replace write_or_die() calls with write_block_or_die() Rohit Ashiwal via GitGitGadget
2019-04-13  1:34   ` Jeff King
2019-04-13  5:51     ` Junio C Hamano
2019-04-14  4:36       ` Rohit Ashiwal
2019-04-26 14:29       ` Johannes Schindelin
2019-04-26 23:44         ` Junio C Hamano
2019-04-29 21:32           ` Johannes Schindelin
2019-05-01 18:09             ` Jeff King
2019-05-02 20:29               ` René Scharfe
2019-05-05  5:25               ` Junio C Hamano
2019-05-06  5:07                 ` Jeff King
2019-04-14  4:34     ` Rohit Ashiwal
2019-04-14 10:33       ` Junio C Hamano
2019-04-26 14:28     ` Johannes Schindelin
2019-05-01 18:07       ` Jeff King
2019-04-12 23:04 ` [PATCH 2/2] archive: avoid spawning `gzip` Rohit Ashiwal via GitGitGadget
2019-04-13  1:51   ` Jeff King
2019-04-13 22:01     ` René Scharfe
2019-04-15 21:35       ` Jeff King
2019-04-26 14:51         ` Johannes Schindelin
2019-04-27  9:59           ` René Scharfe
2019-04-27 17:39             ` René Scharfe
2019-04-29 21:25               ` Johannes Schindelin
2019-05-01 17:45                 ` René Scharfe
2019-05-01 18:18                   ` Jeff King
2019-06-10 10:44                     ` René Scharfe
2019-06-13 19:16                       ` Jeff King
2019-04-13 22:16     ` brian m. carlson
2019-04-15 21:36       ` Jeff King
2019-04-26 14:54       ` Johannes Schindelin
2019-05-02 20:20         ` Ævar Arnfjörð Bjarmason
2019-05-03 20:49           ` Johannes Schindelin
2019-05-03 20:52             ` Jeff King
2019-04-26 14:47     ` Johannes Schindelin
     [not found] ` <pull.145.v2.git.gitgitgadget@gmail.com>
     [not found]   ` <4ea94a8784876c3a19e387537edd81a957fc692c.1556321244.git.gitgitgadget@gmail.com>
2019-05-02 20:29     ` [PATCH v2 3/4] archive: optionally use zlib directly for gzip compression René Scharfe
     [not found]   ` <ac2b2488a1b42b3caf8a84594c48eca796748e59.1556321244.git.gitgitgadget@gmail.com>
2019-05-02 20:30     ` [PATCH v2 2/4] archive-tar: mark RECORDSIZE/BLOCKSIZE as unsigned René Scharfe
2019-05-08 11:45       ` Johannes Schindelin
2019-05-08 23:04         ` Jeff King
2019-05-09 14:06           ` Johannes Schindelin
2019-05-09 18:38             ` Jeff King
2019-05-10 17:18               ` René Scharfe
2019-05-10 21:20                 ` Jeff King
2022-06-12  6:00 ` [PATCH v3 0/5] Avoid spawning gzip in git archive René Scharfe
2022-06-12  6:03   ` [PATCH v3 1/5] archive: rename archiver data field to filter_command René Scharfe
2022-06-12  6:05   ` [PATCH v3 2/5] archive-tar: factor out write_block() René Scharfe
2022-06-12  6:08   ` [PATCH v3 3/5] archive-tar: add internal gzip implementation René Scharfe
2022-06-13 19:10     ` Junio C Hamano
2022-06-12  6:18   ` [PATCH v3 4/5] archive-tar: use OS_CODE 3 (Unix) for internal gzip René Scharfe
2022-06-12  6:19   ` [PATCH v3 5/5] archive-tar: use internal gzip by default René Scharfe
2022-06-13 21:55     ` Junio C Hamano
2022-06-14 11:27       ` Johannes Schindelin
2022-06-14 15:47         ` René Scharfe
2022-06-14 15:56           ` René Scharfe
2022-06-14 16:29           ` Johannes Schindelin
2022-06-14 20:04             ` René Scharfe
2022-06-15 16:41               ` Junio C Hamano
2022-06-14 11:28   ` [PATCH v3 0/5] Avoid spawning gzip in git archive Johannes Schindelin
2022-06-14 20:05     ` René Scharfe
2022-06-30 18:55       ` Johannes Schindelin [this message]
2022-07-01 16:05         ` Johannes Schindelin
2022-07-01 16:27           ` Jeff King
2022-07-01 17:47             ` Junio C Hamano
2022-06-15 16:53 ` [PATCH v4 0/6] " René Scharfe
2022-06-15 16:58   ` [PATCH v4 1/6] archive: update format documentation René Scharfe
2022-06-15 16:59   ` [PATCH v4 2/6] archive: rename archiver data field to filter_command René Scharfe
2022-06-15 17:01   ` [PATCH v4 3/6] archive-tar: factor out write_block() René Scharfe
2022-06-15 17:02   ` [PATCH v4 4/6] archive-tar: add internal gzip implementation René Scharfe
2022-06-15 20:32     ` Ævar Arnfjörð Bjarmason
2022-06-16 18:55       ` René Scharfe
2022-06-24 11:13         ` Ævar Arnfjörð Bjarmason
2022-06-24 20:24           ` René Scharfe
2022-06-15 17:04   ` [PATCH v4 5/6] archive-tar: use OS_CODE 3 (Unix) for internal gzip René Scharfe
2022-06-15 17:05   ` [PATCH v4 6/6] archive-tar: use internal gzip by default René Scharfe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ps52p06s-01nr-4ss2-r802-6nsp5nqq5199@tzk.qr \
    --to=johannes.schindelin@gmx.de \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=l.s.r@web.de \
    --cc=peff@peff.net \
    --cc=rohit.ashiwal265@gmail.com \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).