git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To: Jonathan Nieder <jrnieder@gmail.com>
Cc: Junio C Hamano <gitster@pobox.com>,
	Lars Schneider <larsxschneider@gmail.com>,
	Jeff King <peff@peff.net>, Git Mailing List <git@vger.kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: USE_SHA1DC is broken in pu
Date: Thu, 23 Mar 2017 17:43:15 +0100 (CET)	[thread overview]
Message-ID: <alpine.DEB.2.20.1703231724350.3767@virtualbox> (raw)
In-Reply-To: <20170322220246.GD26108@aiede.mtv.corp.google.com>

Hi Jonathan,

On Wed, 22 Mar 2017, Jonathan Nieder wrote:

> Johannes Schindelin wrote:
> 
> > As to the default of seriously slowing down all SHA-1 computations:
> > since you made that the default, at compile time, with no way to turn
> > on the faster computation, this will have a major, negative impact.
> > Are you really, really sure you want to do that?
> >
> > I thought that it was obvious that we would have at least a runtime
> > option to lessen the load.
> 
> It's not obvious to me.  I agree that the DC_SHA1 case can be sped up,
> e.g. by turning off the collision detection for sha1 calculations that
> are not part of fetching, receiving a push, or running fsck.

And in those cases, using OpenSSL instead is *even* faster.

> To be clear, are you saying that this is a bad compile-time default
> because distributors are going to leave it and end-users will end up
> with a bad experience?  Or are you saying distributors have no good
> alternative to choose at compile time?  Or something else?

What I am saying is that this should be a more fine-grained, runtime knob.

If I write out an index, I should not suffer the slowdown from detecting
collisions. Because I implicitly trust myself and everything that I added
(and everything that was checked before already). This may not matter with
small projects. But we know a couple of real-world scenarios where this
matters.

Imagine for example the insane repository described by my colleague Saeed
Noursalehi at GitMerge. It is *ginormous*.

The index is 300MB. If you have to experience a sudden drop in performance
of `git add`, even by "only" 30%, relative to OpenSSL, it is very
noticeable. It is painful.

That is the reason why we spent considerable time trying to enhance
performance of SHA-1 hashing even by as little as a couple of percentage
points here and there. The accumulated wins are noticeable, and
I assume that those wins are completely annihilated by the heavy-handed
switch to detect collisions always.

It gets even worse when it comes to fetching, let alone cloning.

And please note that the gigantic repository I mentioned above is a
company-internal one, i.e. the servers/repository are implicitly trusted.
Having to pay the price of a full clone going from 12+ hours to even only
15+ hours *hurts*. Particularly when that price is paid for no value in
return at all: the server *already* will have checked for crafted objects.

I could imagine that this problem could be addressed to everybody's
satisfaction by introducing a tristate config setting where the collision
detection can be switched on & off, and then also to, say, "external" i.e.
collision detection would be switched on whenever objects are retrieved
from somewhere else than the local repository (e.g. git-receive-pack).

If fetching or cloning from a trusted source, this config setting could be
switched off on the command-line, otherwise left at "external".

And by "switching collision detection off", I of course refer to *not*
using SHA1DC's routines at all, but what would have been used originally,
in Git for Windows' case: (hardware-accelerated) OpenSSL.

Did I manage to clarify the problem?
Johannes

  reply	other threads:[~2017-03-23 16:43 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-16 19:22 USE_SHA1DC is broken in pu Linus Torvalds
2017-03-16 19:41 ` Jeff King
2017-03-16 19:44   ` Linus Torvalds
2017-03-16 19:46     ` Junio C Hamano
2017-03-16 19:51       ` Linus Torvalds
2017-03-16 20:26         ` Linus Torvalds
2017-03-16 19:41 ` Junio C Hamano
2017-03-17  3:18 ` Lars Schneider
2017-03-17  3:32   ` Lars Schneider
2017-03-21 20:09     ` Johannes Schindelin
2017-03-21 20:16       ` Junio C Hamano
2017-03-22 14:32         ` Johannes Schindelin
2017-03-22 22:02           ` Jonathan Nieder
2017-03-23 16:43             ` Johannes Schindelin [this message]
2017-03-23 17:16               ` Linus Torvalds
2017-03-23 17:47                 ` Jeff King
2017-03-23 19:02                   ` Junio C Hamano
2017-03-23 22:22               ` Jonathan Nieder

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.20.1703231724350.3767@virtualbox \
    --to=johannes.schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jrnieder@gmail.com \
    --cc=larsxschneider@gmail.com \
    --cc=peff@peff.net \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).