git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff Hostetler <git@jeffhostetler.com>
To: Linus Torvalds <torvalds@linux-foundation.org>,
	Johannes Schindelin <Johannes.Schindelin@gmx.de>
Cc: Duy Nguyen <pclouds@gmail.com>, Jeff King <peff@peff.net>,
	Junio C Hamano <gitster@pobox.com>,
	Marc Stevens <marc.stevens@cwi.nl>,
	Dan Shumow <danshu@microsoft.com>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: [PATCH] Put sha1dc on a diet
Date: Thu, 2 Mar 2017 13:37:27 -0500	[thread overview]
Message-ID: <85221b97-759f-b7a9-1256-21515d163cbf@jeffhostetler.com> (raw)
In-Reply-To: <CA+55aFzscLaviJac-SB65WFYViY=wyAF3EWOnhHSuzSuFLdPTA@mail.gmail.com>



On 3/2/2017 11:35 AM, Linus Torvalds wrote:
> On Thu, Mar 2, 2017 at 6:45 AM, Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
>> It would probably make sense to switch the index integrity check away from
>> SHA-1 because we really only care about detecting bit flips there, and we
>> have no need for the computational overhead of using a full-blown
>> cryptographic hash for that purpose.
> Which index do you actually see as being a problem, btw? The main file
> index (.git/index) or the pack-file indexes?
>
> We definitely don't need the checking version of sha1 for either of
> those, but as Jeff already did the math, at least the pack-file index
> is almost negligible, because the pack-file operations that update it
> end up doing SHA1 over the objects - and the object SHA1 calculations
> are much bigger.
>
> And I don't think we even check the pack-file index hashes except on fsck.
>
> Now, if your _file_ index is 300-400MB (and I do think we check the
> SHA fingerprint on that even on just reading it - verify_hdr() in
> do_read_index()), then that's going to be a somewhat noticeable hit on
> every normal "git diff" etc.

Yes, the .git/index is 450MB with ~3.1M entries.  verify_hdr() is called 
each time
we read it into memory.

We have been testing a patch in GfW to run the verification in a 
separate thread
while the main thread parses (and mallocs) the cache_entries.  This does 
help
offset the time.
         https://github.com/git-for-windows/git/pull/978/files

> But I'd have expected the stat() calls of all the files listed by that
> index to be the _much_ bigger problem in that case. Or do you just
> turn those off with assume-unchanged?
>
> Yeah, those stat calls are threaded when preloading, but even so..

Yes, the stat() calls are more significant percentage of the time (and 
having
core.fscache and core.preloadindex help that greatly), but the total 
time for a command
is just that -- the total -- so using the philosophy of "every little 
bit helps", the faster
routines help us here.

> Anyway, the file index SHA1 checking could probably just be disabled
> entirely (with a config flag). It's a corruption check that simply
> isn't that important. So if that's your main SHA1 issue, that would be
> easy to fix.

Yes, in the GVFS effort, we disabled the verification with a config 
setting and haven't
had any incidents.


> Everything else - like pack-file generation etc for a big clone() may
> end up using a ton of SHA1 too, but the SHA1 costs all scale with the
> other costs that drown them out (ie zlib, network, etc).
>
> I'd love to see a profile if you have one.
>
>                        Linus


  reply	other threads:[~2017-03-02 18:46 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-01  0:30 [PATCH] Put sha1dc on a diet Linus Torvalds
2017-03-01 18:42 ` Junio C Hamano
2017-03-01 18:49   ` Linus Torvalds
2017-03-01 19:41     ` Junio C Hamano
2017-03-01 21:56       ` Johannes Schindelin
2017-03-01 22:05         ` Junio C Hamano
2017-03-01 22:16         ` Linus Torvalds
2017-03-01 22:51           ` Johannes Schindelin
2017-03-01 23:05             ` Linus Torvalds
2017-03-01 23:19               ` Jeff King
2017-03-02  6:10                 ` Duy Nguyen
2017-03-02 14:45                   ` Johannes Schindelin
2017-03-02 16:35                     ` Linus Torvalds
2017-03-02 18:37                       ` Jeff Hostetler [this message]
2017-03-02 19:04                         ` Linus Torvalds
2017-03-02 14:39                 ` Johannes Schindelin
2017-03-02 14:37               ` Johannes Schindelin
2017-03-01 19:53     ` Jeff King
     [not found]       ` <CA+55aFwf3sxKW+dGTMjNAeHMOf=rvctEQohm+rbhEb=e3KLpHw@mail.gmail.com>
2017-03-01 20:34         ` Jeff King
     [not found]           ` <CA+55aFwr1jncrk-cekn0Y8rs_S+zs7RrgQ-Jb-ZbgCvmVrHT_A@mail.gmail.com>
2017-03-01 23:13             ` Jeff King
2017-03-01 23:38           ` Linus Torvalds
2017-03-02  1:31             ` Dan Shumow
2017-03-02  4:38               ` Junio C Hamano
2017-03-04  1:07                 ` Dan Shumow
2017-03-13 15:13                   ` Jeff King
     [not found]                     ` <CY1PR0301MB2107B3C5131D5DC7F91A0147C4250@CY1PR0301MB2107.namprd03.prod.outlook.com>
     [not found]                       ` <CY1PR0301MB2107876B6E47FBCF03AB1EA1C4250@CY1PR0301MB2107.namprd03.prod.outlook.com>
2017-03-13 19:48                         ` Jeff King
2017-03-13 20:12                           ` Marc Stevens
2017-03-13 20:20                             ` Linus Torvalds
2017-03-13 20:47                               ` Marc Stevens
2017-03-13 21:00                                 ` Jeff King
2017-03-13 21:15                                   ` Marc Stevens
2017-03-16 18:22                                     ` Marc Stevens
2017-03-16 22:06                                       ` Jeff King
2017-03-16 22:07                                         ` Dan Shumow
2017-03-01 19:07 ` Jeff King
2017-03-01 19:10   ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=85221b97-759f-b7a9-1256-21515d163cbf@jeffhostetler.com \
    --to=git@jeffhostetler.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=danshu@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=marc.stevens@cwi.nl \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).