git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Lars Schneider <larsxschneider@gmail.com>
To: Jeff King <peff@peff.net>
Cc: Junio C Hamano <gitster@pobox.com>, Git List <git@vger.kernel.org>
Subject: Re: SHA1 collision in production repo?! (probably not)
Date: Tue, 12 Sep 2017 18:18:32 +0200	[thread overview]
Message-ID: <512E7480-E923-4EBF-BA9D-1FEEB99B8BA6@gmail.com> (raw)
In-Reply-To: <20170331174515.j2ruifuigskyvucc@sigill.intra.peff.net>


> On 31 Mar 2017, at 19:45, Jeff King <peff@peff.net> wrote:
> 
> On Fri, Mar 31, 2017 at 10:35:06AM -0700, Junio C Hamano wrote:
> 
>> Lars Schneider <larsxschneider@gmail.com> writes:
>> 
>>> Hi,
>>> 
>>> I just got a report with the following output after a "git fetch" operation
>>> using Git 2.11.0.windows.3 [1]:
>>> 
>>> remote: Counting objects: 5922, done.
>>> remote: Compressing objects: 100% (14/14), done.
>>> error: inflate: data stream error (unknown compression method)
>>> error: unable to unpack 6acd8f279a8b20311665f41134579b7380970446 header
>>> fatal: SHA1 COLLISION FOUND WITH 6acd8f279a8b20311665f41134579b7380970446 !
>>> fatal: index-pack failed
>>> 
>>> I would be really surprised if we discovered a SHA1 collision in a production
>>> repo. My guess is that this is somehow triggered by a network issue (see data
>>> stream error). Any tips how to debug this?
>> 
>> Perhaps the first thing to do is to tweak the messages in builtin/index-pack.c
>> to help you identify which one of identical 5 messages is firing.
>> 
>> My guess would be that the code saw an object that came over the
>> wire, hashed it to learn that its object name is
>> 6acd8f279a8b20311665f41134579b7380970446, noticed that it locally
>> already has the object with the same name, and tried to make sure
>> they have identical contents (otherwise, what came over the wire is
>> a successful second preimage attack).  But your loose object on disk
>> you already had was corrupt and didn't inflate correctly when
>> builtin/index-pack.c::compare_objects() or check_collision() tried
>> to.  The code saw no data, or truncated data, or
>> whatever---something different from what the other data that hashed
>> down to 6acd8..., and reported a collision when there is no
>> collision.
> 
> My guess is the one in compare_objects(). The "unable to unpack" error
> comes from sha1_loose_object_info(). We'd normally then follow up with
> read_sha1_file(), which would generate its own set of errors.
> 
> But if open_istream() got a bogus value for the object size (but didn't
> correctly report an error), then it would probably quietly return 0
> bytes from read_istream() later.
> 
> I suspect this may improve things, but I haven't dug deeper to see if
> there are unwanted side effects, or if there are other spots that need
> similar treatment.
> 
> diff --git a/sha1_file.c b/sha1_file.c
> index 43990dec7..38411f90b 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -2952,7 +2952,7 @@ static int sha1_loose_object_info(const unsigned char *sha1,
> 	if (status && oi->typep)
> 		*oi->typep = status;
> 	strbuf_release(&hdrbuf);
> -	return 0;
> +	return status;
> }
> 
> int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi, unsigned flags)

Hi Peff,

we are seeing this now in Git 2.14.1:

...
error: inflate: data stream error (unknown compression method)
error: unable to unpack 7b513f98a66ef9488e516e7abbc246438597c6d5 header
error: inflate: data stream error (unknown compression method)
error: unable to unpack 7b513f98a66ef9488e516e7abbc246438597c6d5 header
fatal: loose object 7b513f98a66ef9488e516e7abbc246438597c6d5 (stored in .git/objects/7b/513f98a66ef9488e516e7abbc246438597c6d5) is corrupt
fatal: The remote end hung up unexpectedly

I guess this means your fix [1] works properly :-)

At some point I will try to explore a retry mechanism for these cases.

Cheers,
Lars

[1] https://github.com/git/git/commit/93cff9a978e1c177ac3e889867004a56773301b2

  parent reply	other threads:[~2017-09-12 16:18 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-31 16:05 SHA1 collision in production repo?! (probably not) Lars Schneider
2017-03-31 17:27 ` Jeff King
2017-03-31 17:35 ` Junio C Hamano
2017-03-31 17:45   ` Jeff King
2017-03-31 17:48     ` Jeff King
2017-03-31 18:19       ` Junio C Hamano
2017-03-31 18:42         ` Jeff King
2017-03-31 21:16     ` Junio C Hamano
2017-04-01  8:03       ` Jeff King
2017-04-01  8:05         ` [PATCH 1/2] sha1_loose_object_info: return error for corrupted objects Jeff King
2017-04-01 17:47           ` Junio C Hamano
2017-04-01  8:09         ` [PATCH 2/2] index-pack: detect local corruption in collision check Jeff King
2017-04-01 18:04           ` Junio C Hamano
2017-09-12 16:18     ` Lars Schneider [this message]
2017-09-12 17:38       ` SHA1 collision in production repo?! (probably not) Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=512E7480-E923-4EBF-BA9D-1FEEB99B8BA6@gmail.com \
    --to=larsxschneider@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).