From: Lars Schneider <larsxschneider@gmail.com>
To: Jeff King <peff@peff.net>
Cc: Junio C Hamano <gitster@pobox.com>, Git List <git@vger.kernel.org>
Subject: Re: SHA1 collision in production repo?! (probably not)
Date: Tue, 12 Sep 2017 18:18:32 +0200 [thread overview]
Message-ID: <512E7480-E923-4EBF-BA9D-1FEEB99B8BA6@gmail.com> (raw)
In-Reply-To: <20170331174515.j2ruifuigskyvucc@sigill.intra.peff.net>
> On 31 Mar 2017, at 19:45, Jeff King <peff@peff.net> wrote:
>
> On Fri, Mar 31, 2017 at 10:35:06AM -0700, Junio C Hamano wrote:
>
>> Lars Schneider <larsxschneider@gmail.com> writes:
>>
>>> Hi,
>>>
>>> I just got a report with the following output after a "git fetch" operation
>>> using Git 2.11.0.windows.3 [1]:
>>>
>>> remote: Counting objects: 5922, done.
>>> remote: Compressing objects: 100% (14/14), done.
>>> error: inflate: data stream error (unknown compression method)
>>> error: unable to unpack 6acd8f279a8b20311665f41134579b7380970446 header
>>> fatal: SHA1 COLLISION FOUND WITH 6acd8f279a8b20311665f41134579b7380970446 !
>>> fatal: index-pack failed
>>>
>>> I would be really surprised if we discovered a SHA1 collision in a production
>>> repo. My guess is that this is somehow triggered by a network issue (see data
>>> stream error). Any tips how to debug this?
>>
>> Perhaps the first thing to do is to tweak the messages in builtin/index-pack.c
>> to help you identify which one of identical 5 messages is firing.
>>
>> My guess would be that the code saw an object that came over the
>> wire, hashed it to learn that its object name is
>> 6acd8f279a8b20311665f41134579b7380970446, noticed that it locally
>> already has the object with the same name, and tried to make sure
>> they have identical contents (otherwise, what came over the wire is
>> a successful second preimage attack). But your loose object on disk
>> you already had was corrupt and didn't inflate correctly when
>> builtin/index-pack.c::compare_objects() or check_collision() tried
>> to. The code saw no data, or truncated data, or
>> whatever---something different from what the other data that hashed
>> down to 6acd8..., and reported a collision when there is no
>> collision.
>
> My guess is the one in compare_objects(). The "unable to unpack" error
> comes from sha1_loose_object_info(). We'd normally then follow up with
> read_sha1_file(), which would generate its own set of errors.
>
> But if open_istream() got a bogus value for the object size (but didn't
> correctly report an error), then it would probably quietly return 0
> bytes from read_istream() later.
>
> I suspect this may improve things, but I haven't dug deeper to see if
> there are unwanted side effects, or if there are other spots that need
> similar treatment.
>
> diff --git a/sha1_file.c b/sha1_file.c
> index 43990dec7..38411f90b 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -2952,7 +2952,7 @@ static int sha1_loose_object_info(const unsigned char *sha1,
> if (status && oi->typep)
> *oi->typep = status;
> strbuf_release(&hdrbuf);
> - return 0;
> + return status;
> }
>
> int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi, unsigned flags)
Hi Peff,
we are seeing this now in Git 2.14.1:
...
error: inflate: data stream error (unknown compression method)
error: unable to unpack 7b513f98a66ef9488e516e7abbc246438597c6d5 header
error: inflate: data stream error (unknown compression method)
error: unable to unpack 7b513f98a66ef9488e516e7abbc246438597c6d5 header
fatal: loose object 7b513f98a66ef9488e516e7abbc246438597c6d5 (stored in .git/objects/7b/513f98a66ef9488e516e7abbc246438597c6d5) is corrupt
fatal: The remote end hung up unexpectedly
I guess this means your fix [1] works properly :-)
At some point I will try to explore a retry mechanism for these cases.
Cheers,
Lars
[1] https://github.com/git/git/commit/93cff9a978e1c177ac3e889867004a56773301b2
next prev parent reply other threads:[~2017-09-12 16:18 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-03-31 16:05 SHA1 collision in production repo?! (probably not) Lars Schneider
2017-03-31 17:27 ` Jeff King
2017-03-31 17:35 ` Junio C Hamano
2017-03-31 17:45 ` Jeff King
2017-03-31 17:48 ` Jeff King
2017-03-31 18:19 ` Junio C Hamano
2017-03-31 18:42 ` Jeff King
2017-03-31 21:16 ` Junio C Hamano
2017-04-01 8:03 ` Jeff King
2017-04-01 8:05 ` [PATCH 1/2] sha1_loose_object_info: return error for corrupted objects Jeff King
2017-04-01 17:47 ` Junio C Hamano
2017-04-01 8:09 ` [PATCH 2/2] index-pack: detect local corruption in collision check Jeff King
2017-04-01 18:04 ` Junio C Hamano
2017-09-12 16:18 ` Lars Schneider [this message]
2017-09-12 17:38 ` SHA1 collision in production repo?! (probably not) Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=512E7480-E923-4EBF-BA9D-1FEEB99B8BA6@gmail.com \
--to=larsxschneider@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).