From: Jeff King <peff@peff.net>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Junio C Hamano <gitster@pobox.com>,
Git Mailing List <git@vger.kernel.org>,
Dan Shumow <danshu@microsoft.com>,
Marc Stevens <marc.stevens@cwi.nl>
Subject: Re: [PATCH] Put sha1dc on a diet
Date: Wed, 1 Mar 2017 15:34:27 -0500 [thread overview]
Message-ID: <20170301203427.e5xa5ej3czli7c3o@sigill.intra.peff.net> (raw)
In-Reply-To: <CA+55aFwf3sxKW+dGTMjNAeHMOf=rvctEQohm+rbhEb=e3KLpHw@mail.gmail.com>
On Wed, Mar 01, 2017 at 12:14:34PM -0800, Linus Torvalds wrote:
> > My biggest concern is the index-pack operation. Try this:
>
> I'm mobile right now, so I can't test, but I'd this perhaps at least partly
> due to the full checksum over the pack-file?
>
> We have two very different uses of SHA1: the actual object name hash, but
> also the sha1file checksums that we do on the index file and the pack files.
>
> And the checksum code really doesn't need the collision checking at all.
I don't think that helps. The sha1 over the pack-file takes about 1.3s
with openssl, and 5s with sha1dc. So we already know the increase there
is only a few seconds, not a few minutes.
And it makes sense if you think about the index-pack operation. It has
to inflate each object, resolving deltas, and checksum the result. And
the number of inflated bytes is _much_ larger than the on-disk bytes.
You can see the difference with:
git cat-file --batch-all-objects \
--batch-check='%(objectsize:disk) %(objectsize)' |
perl -alne '
$disk += $F[0]; $raw += $F[1];
END { print "$disk $raw" }
'
On linux.git that yields:
1210521959 63279680406
That's over a 50x increase in the bytes we have to sha1 for objects
versus pack-checksums.
-Peff
next prev parent reply other threads:[~2017-03-01 21:35 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-03-01 0:30 [PATCH] Put sha1dc on a diet Linus Torvalds
2017-03-01 18:42 ` Junio C Hamano
2017-03-01 18:49 ` Linus Torvalds
2017-03-01 19:41 ` Junio C Hamano
2017-03-01 21:56 ` Johannes Schindelin
2017-03-01 22:05 ` Junio C Hamano
2017-03-01 22:16 ` Linus Torvalds
2017-03-01 22:51 ` Johannes Schindelin
2017-03-01 23:05 ` Linus Torvalds
2017-03-01 23:19 ` Jeff King
2017-03-02 6:10 ` Duy Nguyen
2017-03-02 14:45 ` Johannes Schindelin
2017-03-02 16:35 ` Linus Torvalds
2017-03-02 18:37 ` Jeff Hostetler
2017-03-02 19:04 ` Linus Torvalds
2017-03-02 14:39 ` Johannes Schindelin
2017-03-02 14:37 ` Johannes Schindelin
2017-03-01 19:53 ` Jeff King
[not found] ` <CA+55aFwf3sxKW+dGTMjNAeHMOf=rvctEQohm+rbhEb=e3KLpHw@mail.gmail.com>
2017-03-01 20:34 ` Jeff King [this message]
[not found] ` <CA+55aFwr1jncrk-cekn0Y8rs_S+zs7RrgQ-Jb-ZbgCvmVrHT_A@mail.gmail.com>
2017-03-01 23:13 ` Jeff King
2017-03-01 23:38 ` Linus Torvalds
2017-03-02 1:31 ` Dan Shumow
2017-03-02 4:38 ` Junio C Hamano
2017-03-04 1:07 ` Dan Shumow
2017-03-13 15:13 ` Jeff King
[not found] ` <CY1PR0301MB2107B3C5131D5DC7F91A0147C4250@CY1PR0301MB2107.namprd03.prod.outlook.com>
[not found] ` <CY1PR0301MB2107876B6E47FBCF03AB1EA1C4250@CY1PR0301MB2107.namprd03.prod.outlook.com>
2017-03-13 19:48 ` Jeff King
2017-03-13 20:12 ` Marc Stevens
2017-03-13 20:20 ` Linus Torvalds
2017-03-13 20:47 ` Marc Stevens
2017-03-13 21:00 ` Jeff King
2017-03-13 21:15 ` Marc Stevens
2017-03-16 18:22 ` Marc Stevens
2017-03-16 22:06 ` Jeff King
2017-03-16 22:07 ` Dan Shumow
2017-03-01 19:07 ` Jeff King
2017-03-01 19:10 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170301203427.e5xa5ej3czli7c3o@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=danshu@microsoft.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=marc.stevens@cwi.nl \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).