From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: git@vger.kernel.org
Cc: "Junio C Hamano" <gitster@pobox.com>,
"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: [PATCH] hash-object: don't pointlessly zlib compress without -w
Date: Tue, 21 May 2019 00:29:32 +0200 [thread overview]
Message-ID: <20190520222932.22843-1-avarab@gmail.com> (raw)
When hash-object hashes something the size of core.bigFileThreshold or
larger (512MB by default) it'll be streamed through
stream_to_pack().
That added in 568508e765 ("bulk-checkin: replace fast-import based
implementation", 2011-10-28) would compress the file with zlib, but
was oblivious as to whether the content would actually be written out
to disk, which isn't the case unless hash-object is called with the
"-w" option.
Hashing is much slower if we need to compress the content, so let's
check if the HASH_WRITE_OBJECT flag has been given.
An accompanying perf test shows how much this improves things. With
CFLAGS=-O3 and OPENSSL_SHA1=Y the relevant change is (manually
reformatted to avoid long lines):
1007.6: 'git hash-object <file>' with threshold=32M
-> 1.57(1.55+0.01) 0.09(0.09+0.00) -94.3%
1007.7: 'git hash-object --stdin < <file>' with threshold=32M
-> 1.57(1.57+0.00) 0.09(0.07+0.01) -94.3%
1007.8: 'echo <file> | git hash-object --stdin-paths' threshold=32M
-> 1.59(1.56+0.00) 0.09(0.08+0.00) -94.3%
The same tests using "-w" still take that long, since those will need
to zlib compress the relevant object. With the sha1collisiondetection
library (our default) there's less of a difference since the hashing
itself is slower, or respectively:
1.71(1.65+0.01) 0.19(0.18+0.01) -88.9%
1.70(1.66+0.02) 0.19(0.19+0.00) -88.8%
1.69(1.66+0.00) 0.19(0.18+0.00) -88.8%
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
bulk-checkin.c | 3 ++-
t/perf/p1007-hash-object.sh | 53 +++++++++++++++++++++++++++++++++++++
2 files changed, 55 insertions(+), 1 deletion(-)
create mode 100755 t/perf/p1007-hash-object.sh
diff --git a/bulk-checkin.c b/bulk-checkin.c
index 39ee7d6107..a26126ee76 100644
--- a/bulk-checkin.c
+++ b/bulk-checkin.c
@@ -105,8 +105,9 @@ static int stream_to_pack(struct bulk_checkin_state *state,
int status = Z_OK;
int write_object = (flags & HASH_WRITE_OBJECT);
off_t offset = 0;
+ int level = write_object ? pack_compression_level : Z_NO_COMPRESSION;
- git_deflate_init(&s, pack_compression_level);
+ git_deflate_init(&s, level);
hdrlen = encode_in_pack_object_header(obuf, sizeof(obuf), type, size);
s.next_out = obuf + hdrlen;
diff --git a/t/perf/p1007-hash-object.sh b/t/perf/p1007-hash-object.sh
new file mode 100755
index 0000000000..8df6dc59a5
--- /dev/null
+++ b/t/perf/p1007-hash-object.sh
@@ -0,0 +1,53 @@
+#!/bin/sh
+
+test_description="Tests performance of hash-object"
+. ./perf-lib.sh
+
+test_perf_fresh_repo
+
+test_lazy_prereq SHA1SUM_AND_SANE_DD_AND_URANDOM '
+ >empty &&
+ sha1sum empty >empty.sha1sum &&
+ grep -q -w da39a3ee5e6b4b0d3255bfef95601890afd80709 empty.sha1sum &&
+ dd if=/dev/urandom of=random.test bs=1024 count=1 &&
+ stat -c %s random.test >random.size &&
+ grep -q -x 1024 random.size
+'
+
+if test_have_prereq !SHA1SUM_AND_SANE_DD_AND_URANDOM
+then
+ skip_all='failed prereq check for sha1sum/dd/stat'
+ test_perf 'dummy p0013 test (skipped all tests)' 'true'
+ test_done
+fi
+
+test_expect_success 'setup 64MB file.random file' '
+ dd if=/dev/urandom of=file.random count=$((64*1024)) bs=1024
+'
+
+test_perf 'sha1sum(1) on file.random (for comparison)' '
+ sha1sum file.random
+'
+
+for threshold in 32M 64M
+do
+ for write in '' ' -w'
+ do
+ for literally in ' --literally -t commit' ''
+ do
+ test_perf "'git hash-object$write$literally <file>' with threshold=$threshold" "
+ git -c core.bigFileThreshold=$threshold hash-object$write$literally file.random
+ "
+
+ test_perf "'git hash-object$write$literally --stdin < <file>' with threshold=$threshold" "
+ git -c core.bigFileThreshold=$threshold hash-object$write$literally --stdin <file.random
+ "
+
+ test_perf "'echo <file> | git hash-object$write$literally --stdin-paths' threshold=$threshold" "
+ echo file.random | git -c core.bigFileThreshold=$threshold hash-object$write$literally --stdin-paths
+ "
+ done
+ done
+done
+
+test_done
--
2.21.0.1020.gf2820cf01a
next reply other threads:[~2019-05-20 22:29 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-05-20 22:29 Ævar Arnfjörð Bjarmason [this message]
2019-05-22 5:32 ` [PATCH] hash-object: don't pointlessly zlib compress without -w Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190520222932.22843-1-avarab@gmail.com \
--to=avarab@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).