git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Junio C Hamano <gitster@pobox.com>,
	Git Mailing List <git@vger.kernel.org>,
	Dan Shumow <danshu@microsoft.com>,
	Marc Stevens <marc.stevens@cwi.nl>
Subject: Re: [PATCH] Put sha1dc on a diet
Date: Wed, 1 Mar 2017 15:34:27 -0500	[thread overview]
Message-ID: <20170301203427.e5xa5ej3czli7c3o@sigill.intra.peff.net> (raw)
In-Reply-To: <CA+55aFwf3sxKW+dGTMjNAeHMOf=rvctEQohm+rbhEb=e3KLpHw@mail.gmail.com>

On Wed, Mar 01, 2017 at 12:14:34PM -0800, Linus Torvalds wrote:

> > My biggest concern is the index-pack operation. Try this:
> 
> I'm mobile right now, so I can't test, but I'd this perhaps at least partly
> due to the full checksum over the pack-file?
>
> We have two very different uses of SHA1: the actual object name hash, but
> also the sha1file checksums that we do on the index file and the pack files.
>
> And the checksum code really doesn't need the collision checking at all.

I don't think that helps. The sha1 over the pack-file takes about 1.3s
with openssl, and 5s with sha1dc. So we already know the increase there
is only a few seconds, not a few minutes.

And it makes sense if you think about the index-pack operation. It has
to inflate each object, resolving deltas, and checksum the result. And
the number of inflated bytes is _much_ larger than the on-disk bytes.
You can see the difference with:

  git cat-file --batch-all-objects \
    --batch-check='%(objectsize:disk) %(objectsize)' |
  perl -alne '
    $disk += $F[0]; $raw += $F[1];
    END { print "$disk $raw" }
  '

On linux.git that yields:

  1210521959 63279680406

That's over a 50x increase in the bytes we have to sha1 for objects
versus pack-checksums.

-Peff

  parent reply	other threads:[~2017-03-01 21:35 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-01  0:30 [PATCH] Put sha1dc on a diet Linus Torvalds
2017-03-01 18:42 ` Junio C Hamano
2017-03-01 18:49   ` Linus Torvalds
2017-03-01 19:41     ` Junio C Hamano
2017-03-01 21:56       ` Johannes Schindelin
2017-03-01 22:05         ` Junio C Hamano
2017-03-01 22:16         ` Linus Torvalds
2017-03-01 22:51           ` Johannes Schindelin
2017-03-01 23:05             ` Linus Torvalds
2017-03-01 23:19               ` Jeff King
2017-03-02  6:10                 ` Duy Nguyen
2017-03-02 14:45                   ` Johannes Schindelin
2017-03-02 16:35                     ` Linus Torvalds
2017-03-02 18:37                       ` Jeff Hostetler
2017-03-02 19:04                         ` Linus Torvalds
2017-03-02 14:39                 ` Johannes Schindelin
2017-03-02 14:37               ` Johannes Schindelin
2017-03-01 19:53     ` Jeff King
     [not found]       ` <CA+55aFwf3sxKW+dGTMjNAeHMOf=rvctEQohm+rbhEb=e3KLpHw@mail.gmail.com>
2017-03-01 20:34         ` Jeff King [this message]
     [not found]           ` <CA+55aFwr1jncrk-cekn0Y8rs_S+zs7RrgQ-Jb-ZbgCvmVrHT_A@mail.gmail.com>
2017-03-01 23:13             ` Jeff King
2017-03-01 23:38           ` Linus Torvalds
2017-03-02  1:31             ` Dan Shumow
2017-03-02  4:38               ` Junio C Hamano
2017-03-04  1:07                 ` Dan Shumow
2017-03-13 15:13                   ` Jeff King
     [not found]                     ` <CY1PR0301MB2107B3C5131D5DC7F91A0147C4250@CY1PR0301MB2107.namprd03.prod.outlook.com>
     [not found]                       ` <CY1PR0301MB2107876B6E47FBCF03AB1EA1C4250@CY1PR0301MB2107.namprd03.prod.outlook.com>
2017-03-13 19:48                         ` Jeff King
2017-03-13 20:12                           ` Marc Stevens
2017-03-13 20:20                             ` Linus Torvalds
2017-03-13 20:47                               ` Marc Stevens
2017-03-13 21:00                                 ` Jeff King
2017-03-13 21:15                                   ` Marc Stevens
2017-03-16 18:22                                     ` Marc Stevens
2017-03-16 22:06                                       ` Jeff King
2017-03-16 22:07                                         ` Dan Shumow
2017-03-01 19:07 ` Jeff King
2017-03-01 19:10   ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170301203427.e5xa5ej3czli7c3o@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=danshu@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=marc.stevens@cwi.nl \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).