git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Cc: Duy Nguyen <pclouds@gmail.com>, Jeff King <peff@peff.net>,
	Junio C Hamano <gitster@pobox.com>,
	Marc Stevens <marc.stevens@cwi.nl>,
	Dan Shumow <danshu@microsoft.com>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: [PATCH] Put sha1dc on a diet
Date: Thu, 2 Mar 2017 08:35:36 -0800	[thread overview]
Message-ID: <CA+55aFzscLaviJac-SB65WFYViY=wyAF3EWOnhHSuzSuFLdPTA@mail.gmail.com> (raw)
In-Reply-To: <alpine.DEB.2.20.1703021539330.3767@virtualbox>

On Thu, Mar 2, 2017 at 6:45 AM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
> It would probably make sense to switch the index integrity check away from
> SHA-1 because we really only care about detecting bit flips there, and we
> have no need for the computational overhead of using a full-blown
> cryptographic hash for that purpose.

Which index do you actually see as being a problem, btw? The main file
index (.git/index) or the pack-file indexes?

We definitely don't need the checking version of sha1 for either of
those, but as Jeff already did the math, at least the pack-file index
is almost negligible, because the pack-file operations that update it
end up doing SHA1 over the objects - and the object SHA1 calculations
are much bigger.

And I don't think we even check the pack-file index hashes except on fsck.

Now, if your _file_ index is 300-400MB (and I do think we check the
SHA fingerprint on that even on just reading it - verify_hdr() in
do_read_index()), then that's going to be a somewhat noticeable hit on
every normal "git diff" etc.

But I'd have expected the stat() calls of all the files listed by that
index to be the _much_ bigger problem in that case. Or do you just
turn those off with assume-unchanged?

Yeah, those stat calls are threaded when preloading, but even so..

Anyway, the file index SHA1 checking could probably just be disabled
entirely (with a config flag). It's a corruption check that simply
isn't that important. So if that's your main SHA1 issue, that would be
easy to fix.

Everything else - like pack-file generation etc for a big clone() may
end up using a ton of SHA1 too, but the SHA1 costs all scale with the
other costs that drown them out (ie zlib, network, etc).

I'd love to see a profile if you have one.

                      Linus

  reply	other threads:[~2017-03-02 16:37 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-01  0:30 [PATCH] Put sha1dc on a diet Linus Torvalds
2017-03-01 18:42 ` Junio C Hamano
2017-03-01 18:49   ` Linus Torvalds
2017-03-01 19:41     ` Junio C Hamano
2017-03-01 21:56       ` Johannes Schindelin
2017-03-01 22:05         ` Junio C Hamano
2017-03-01 22:16         ` Linus Torvalds
2017-03-01 22:51           ` Johannes Schindelin
2017-03-01 23:05             ` Linus Torvalds
2017-03-01 23:19               ` Jeff King
2017-03-02  6:10                 ` Duy Nguyen
2017-03-02 14:45                   ` Johannes Schindelin
2017-03-02 16:35                     ` Linus Torvalds [this message]
2017-03-02 18:37                       ` Jeff Hostetler
2017-03-02 19:04                         ` Linus Torvalds
2017-03-02 14:39                 ` Johannes Schindelin
2017-03-02 14:37               ` Johannes Schindelin
2017-03-01 19:53     ` Jeff King
     [not found]       ` <CA+55aFwf3sxKW+dGTMjNAeHMOf=rvctEQohm+rbhEb=e3KLpHw@mail.gmail.com>
2017-03-01 20:34         ` Jeff King
     [not found]           ` <CA+55aFwr1jncrk-cekn0Y8rs_S+zs7RrgQ-Jb-ZbgCvmVrHT_A@mail.gmail.com>
2017-03-01 23:13             ` Jeff King
2017-03-01 23:38           ` Linus Torvalds
2017-03-02  1:31             ` Dan Shumow
2017-03-02  4:38               ` Junio C Hamano
2017-03-04  1:07                 ` Dan Shumow
2017-03-13 15:13                   ` Jeff King
     [not found]                     ` <CY1PR0301MB2107B3C5131D5DC7F91A0147C4250@CY1PR0301MB2107.namprd03.prod.outlook.com>
     [not found]                       ` <CY1PR0301MB2107876B6E47FBCF03AB1EA1C4250@CY1PR0301MB2107.namprd03.prod.outlook.com>
2017-03-13 19:48                         ` Jeff King
2017-03-13 20:12                           ` Marc Stevens
2017-03-13 20:20                             ` Linus Torvalds
2017-03-13 20:47                               ` Marc Stevens
2017-03-13 21:00                                 ` Jeff King
2017-03-13 21:15                                   ` Marc Stevens
2017-03-16 18:22                                     ` Marc Stevens
2017-03-16 22:06                                       ` Jeff King
2017-03-16 22:07                                         ` Dan Shumow
2017-03-01 19:07 ` Jeff King
2017-03-01 19:10   ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CA+55aFzscLaviJac-SB65WFYViY=wyAF3EWOnhHSuzSuFLdPTA@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=danshu@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=marc.stevens@cwi.nl \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).