From: Linus Torvalds <torvalds@linux-foundation.org>
To: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Cc: Duy Nguyen <pclouds@gmail.com>, Jeff King <peff@peff.net>,
Junio C Hamano <gitster@pobox.com>,
Marc Stevens <marc.stevens@cwi.nl>,
Dan Shumow <danshu@microsoft.com>,
Git Mailing List <git@vger.kernel.org>
Subject: Re: [PATCH] Put sha1dc on a diet
Date: Thu, 2 Mar 2017 08:35:36 -0800 [thread overview]
Message-ID: <CA+55aFzscLaviJac-SB65WFYViY=wyAF3EWOnhHSuzSuFLdPTA@mail.gmail.com> (raw)
In-Reply-To: <alpine.DEB.2.20.1703021539330.3767@virtualbox>
On Thu, Mar 2, 2017 at 6:45 AM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
> It would probably make sense to switch the index integrity check away from
> SHA-1 because we really only care about detecting bit flips there, and we
> have no need for the computational overhead of using a full-blown
> cryptographic hash for that purpose.
Which index do you actually see as being a problem, btw? The main file
index (.git/index) or the pack-file indexes?
We definitely don't need the checking version of sha1 for either of
those, but as Jeff already did the math, at least the pack-file index
is almost negligible, because the pack-file operations that update it
end up doing SHA1 over the objects - and the object SHA1 calculations
are much bigger.
And I don't think we even check the pack-file index hashes except on fsck.
Now, if your _file_ index is 300-400MB (and I do think we check the
SHA fingerprint on that even on just reading it - verify_hdr() in
do_read_index()), then that's going to be a somewhat noticeable hit on
every normal "git diff" etc.
But I'd have expected the stat() calls of all the files listed by that
index to be the _much_ bigger problem in that case. Or do you just
turn those off with assume-unchanged?
Yeah, those stat calls are threaded when preloading, but even so..
Anyway, the file index SHA1 checking could probably just be disabled
entirely (with a config flag). It's a corruption check that simply
isn't that important. So if that's your main SHA1 issue, that would be
easy to fix.
Everything else - like pack-file generation etc for a big clone() may
end up using a ton of SHA1 too, but the SHA1 costs all scale with the
other costs that drown them out (ie zlib, network, etc).
I'd love to see a profile if you have one.
Linus
next prev parent reply other threads:[~2017-03-02 16:37 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-03-01 0:30 [PATCH] Put sha1dc on a diet Linus Torvalds
2017-03-01 18:42 ` Junio C Hamano
2017-03-01 18:49 ` Linus Torvalds
2017-03-01 19:41 ` Junio C Hamano
2017-03-01 21:56 ` Johannes Schindelin
2017-03-01 22:05 ` Junio C Hamano
2017-03-01 22:16 ` Linus Torvalds
2017-03-01 22:51 ` Johannes Schindelin
2017-03-01 23:05 ` Linus Torvalds
2017-03-01 23:19 ` Jeff King
2017-03-02 6:10 ` Duy Nguyen
2017-03-02 14:45 ` Johannes Schindelin
2017-03-02 16:35 ` Linus Torvalds [this message]
2017-03-02 18:37 ` Jeff Hostetler
2017-03-02 19:04 ` Linus Torvalds
2017-03-02 14:39 ` Johannes Schindelin
2017-03-02 14:37 ` Johannes Schindelin
2017-03-01 19:53 ` Jeff King
[not found] ` <CA+55aFwf3sxKW+dGTMjNAeHMOf=rvctEQohm+rbhEb=e3KLpHw@mail.gmail.com>
2017-03-01 20:34 ` Jeff King
[not found] ` <CA+55aFwr1jncrk-cekn0Y8rs_S+zs7RrgQ-Jb-ZbgCvmVrHT_A@mail.gmail.com>
2017-03-01 23:13 ` Jeff King
2017-03-01 23:38 ` Linus Torvalds
2017-03-02 1:31 ` Dan Shumow
2017-03-02 4:38 ` Junio C Hamano
2017-03-04 1:07 ` Dan Shumow
2017-03-13 15:13 ` Jeff King
[not found] ` <CY1PR0301MB2107B3C5131D5DC7F91A0147C4250@CY1PR0301MB2107.namprd03.prod.outlook.com>
[not found] ` <CY1PR0301MB2107876B6E47FBCF03AB1EA1C4250@CY1PR0301MB2107.namprd03.prod.outlook.com>
2017-03-13 19:48 ` Jeff King
2017-03-13 20:12 ` Marc Stevens
2017-03-13 20:20 ` Linus Torvalds
2017-03-13 20:47 ` Marc Stevens
2017-03-13 21:00 ` Jeff King
2017-03-13 21:15 ` Marc Stevens
2017-03-16 18:22 ` Marc Stevens
2017-03-16 22:06 ` Jeff King
2017-03-16 22:07 ` Dan Shumow
2017-03-01 19:07 ` Jeff King
2017-03-01 19:10 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CA+55aFzscLaviJac-SB65WFYViY=wyAF3EWOnhHSuzSuFLdPTA@mail.gmail.com' \
--to=torvalds@linux-foundation.org \
--cc=Johannes.Schindelin@gmx.de \
--cc=danshu@microsoft.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=marc.stevens@cwi.nl \
--cc=pclouds@gmail.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).