git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Johannes Schindelin <johannes.schindelin@gmx.de>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 0/7] PREVIEW: Introduce DC_AND_OPENSSL_SHA1 make flag
Date: Fri, 24 Mar 2017 23:37:54 -0700	[thread overview]
Message-ID: <xmqq7f3d6ev1.fsf@gitster.mtv.corp.google.com> (raw)
In-Reply-To: <cover.1490397869.git.johannes.schindelin@gmx.de> (Johannes Schindelin's message of "Sat, 25 Mar 2017 00:24:42 +0100 (CET)")

Johannes Schindelin <johannes.schindelin@gmx.de> writes:

> - the most important part will be the patch turning core.enableSHA1DC
>   into a tristate: "externalOnly" or "smart" or "auto" or something
>   indicating that it switches on collision detection only for commands
>   that accept objects from an outside source into the local repository,
>   such as fetch, fast-import, etc

There are different uses of SHA-1 hashing in Git, and I do agree
that depending on the use, some of them do not need the overhead for
the collision-attack detection.  As DC_SHA1 with attack detection
disabled may still be slower than other implementations, it may make
sense to have a compile-time option to use DC_SHA1 for places that
needs protection, and another implementation [*1*] for places that
don't.

I think the places that MUST use DC variant is anything that hashes
content for a single object to compute its object name.  "git add"
[*2*] that creates a new loose object, "git index-pack" that reads
incoming pack stream over the wire, reconstitutes each object data
while resolving delta and hash them to get their names to construct
the .idx file and "git unpack-objects" that does the same but
explodes the pack contents into loose objects, "git write-tree" that
creates a new tree object given the contents of the index, etc.

One notable exception of the above is "update-index --refresh".  We
already have contents in the index and in the object store, and we
are hashing the contents in the working tree to see if it hashes to
the same value.  When the hash does not match, it won't go in to the
object store.  When the hash does match, it either is indeed the
same content (i.e. no collision), in which case we earlier must have
done the collision-attack detecting hash when we added the object to
the object store.  Or the object we have in the object store and
what is in the working tree are different contents that hash to the
same name, in which case the user already has colliding pair and it
is too late to invoke collision-attack detecting variant ;-)

The running checksum over the whole file csum-file.c computes does
not have to be the collision-attack detecting kind.  This is the
hash at the end of various files like the index, .pack .idx, etc.
These are used to protect us against bit-flipping disks and we are
not fighting with a clever disk that can do collision attack.  For
that matter, some of these checksums do not even have to be SHA-1.
If one hacks his own Git to replace SHA-1 checksum at the end of the
index file with something faster and weaker and use it in one's
repository, nobody else would notice nor care.  The same thing can
be said for the .idx file.  The one at the end of .pack does get
checked at the receiving end when it comes over the wire, so it MUST
be SHA-1, but it does not have to be hashed with collision-attack
detection.

The rerere database is indexed with SHA-1 of (normalized) conflict
text.  This does not have to be hashed with the collision-attack
detection logic.  Thanks to recent update that allows multiple pairs
of conflict and its resolution, the subsystem is prepared to see two
conflicts that share the same hash already (for completely different
reasons).

The hash that names a packfile is constructed by sorting all the
names of the objects contained in the packfile and running SHA-1
hash over it.  I think this MUST be hashed with collision-attack
detection.  A malicious site can feed you a packfile that contains
objects the site crafts so that the sorted object names would result
in a collision-attack, ending up with one pack that contains a sets
of objects different from another pack that happens to have the same
packname, causing Git to say "Ah, this new pack must have the same
set of objects as the pack we already have" and discard it,
resulting in lost objects and a corrupt repository with missing
objects.


[Footnote]

*1* I think this PREVIEW hardcodes OpenSSL only for illustration and
    that is OK for a preview.  Given the recent news on licensing,
    however, if we want to pursue this dual hashing scheme, we must
    consider allowing other implementations as well in the final
    form.

*2* In this paragraph, whenever "git" command is named, I mean both
    the command and its underlying machinery.  When I say "git
    write-tree", for example, write_index_as_tree() obviously is
    included.

  parent reply	other threads:[~2017-03-25  6:38 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-24 23:24 [PATCH 0/7] PREVIEW: Introduce DC_AND_OPENSSL_SHA1 make flag Johannes Schindelin
2017-03-24 23:24 ` [PATCH 1/7] sha1dc: safeguard against outside definitions of BIGENDIAN Johannes Schindelin
2017-03-24 23:24 ` [PATCH 2/7] Makefile: optionally compile with both SHA1DC and SHA1_OPENSSL Johannes Schindelin
2017-03-25 19:51   ` Ævar Arnfjörð Bjarmason
2017-03-30 16:16   ` Junio C Hamano
2017-03-30 16:47     ` Junio C Hamano
2017-04-18 11:28     ` Johannes Schindelin
2017-03-24 23:24 ` [PATCH 3/7] config: add the core.enablesha1dc setting Johannes Schindelin
2017-03-24 23:25 ` [PATCH 4/7] t0013: do not skip the entire file wholesale without DC_SHA1 Johannes Schindelin
2017-03-24 23:25 ` [PATCH 5/7] t0013: test DC_AND_OPENSSL_SHA1, too Johannes Schindelin
2017-03-24 23:28 ` [PATCH 6/7] mingw: enable DC_AND_OPENSSL_SHA1 by default Johannes Schindelin
2017-03-24 23:28 ` [PATCH 7/7] p0013: new test to compare SHA1DC vs OpenSSL Johannes Schindelin
2017-03-25  6:37 ` Junio C Hamano [this message]
2017-03-25 16:58   ` [PATCH 0/7] PREVIEW: Introduce DC_AND_OPENSSL_SHA1 make flag Junio C Hamano
2017-03-26  6:18   ` Jeff King
2017-03-26 23:16     ` Junio C Hamano
2017-03-27  1:11       ` Jeff King
2017-03-27  6:07         ` Junio C Hamano
2017-03-27  7:09           ` Jeff King
2017-03-27 17:15             ` Junio C Hamano
2017-03-29 20:02   ` Johannes Schindelin
2017-03-30  0:31     ` Junio C Hamano
2017-04-18 11:30       ` Johannes Schindelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqq7f3d6ev1.fsf@gitster.mtv.corp.google.com \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=johannes.schindelin@gmx.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).