From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-4.8 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD shortcircuit=no autolearn=ham autolearn_force=no version=3.4.0 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by dcvr.yhbt.net (Postfix) with ESMTP id CBCC61F858 for ; Wed, 3 Aug 2016 17:13:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932635AbcHCRMs (ORCPT ); Wed, 3 Aug 2016 13:12:48 -0400 Received: from cloud.peff.net ([50.56.180.127]:53928 "HELO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S932481AbcHCRMr (ORCPT ); Wed, 3 Aug 2016 13:12:47 -0400 Received: (qmail 7525 invoked by uid 102); 3 Aug 2016 17:11:27 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.84) with SMTP; Wed, 03 Aug 2016 13:11:27 -0400 Received: (qmail 2941 invoked by uid 107); 3 Aug 2016 17:11:55 -0000 Received: from sigill.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.7) by peff.net (qpsmtpd/0.84) with SMTP; Wed, 03 Aug 2016 13:11:55 -0400 Received: by sigill.intra.peff.net (sSMTP sendmail emulation); Wed, 03 Aug 2016 13:11:24 -0400 Date: Wed, 3 Aug 2016 13:11:24 -0400 From: Jeff King To: Santiago Torres Cc: Git Subject: Re: [OT] USENIX paper on Git Message-ID: <20160803171124.yzm5xs67empuka7o@sigill.intra.peff.net> References: <20160801224043.4qmf56pmv27riq4i@LykOS.localdomain> <20160803145830.uwssj4uhqfemhr4o@LykOS.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20160803145830.uwssj4uhqfemhr4o@LykOS.localdomain> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Wed, Aug 03, 2016 at 10:58:31AM -0400, Santiago Torres wrote: > I will be presenting a paper regarding the Git metadata issues that we > discussed at the beginning on the year on USENIX '16. I'm writing To > make everyone in this ML aware that this work exists and to bring > everyone into the loop. > > I'm open for feedback and corrections. If anything seems odd imprecise > to the community, I can make an errata in the presentation (at least). > I'll also try to work towards making corrections anywhere if possible; > this is my first publication, so I wasn't sure if it was possible to > share things before they are published. Thankfully, this is OK in > USENIX's book. Here's the link: I read it over. As far as technical descriptions of Git, it looked OK. I found a few minor nits, but nothing worth caring about (e.g., ref storage is not quite so simple these days as 40 bytes in a file, but there is no point describing the whole packed-refs scheme in your paper, as it does not change anything with respect to your work). Here are my comments on the work itself. They're critical, but meant in a friendly way. :) As far as the attack goes, I'm still not convinced this is all that _interesting_ an attack in the real world. What it boils down to is: the ref state is not signed or authenticated in any way, so somebody who can compromise your server repo or do a MiTM can lie about where the refs are (even if individual commits are signed). So if you want to treat Git as a cryptographic end-to-end distribution mechanism, then no, it fails horribly at that. But the state of the art in source code distribution, no matter which system you use, is much less advanced than that. People download tarballs, even ones with GPG signatures, all the time without verifying their contents. Most packages distribute a sha1sum or similar (sometimes even gpg-signed), but quite often the source of authority is questionable. For example, consider somebody downloading a new package for the first time. They don't know the author in any out-of-band way, so any signatures are likely meaningless. They _might_ be depending on the source domain for some security (and using some hierarchical PKI like TLS+x.509 to be sure they're talking to that domain), but in your threat model, even well-known hosts like FSF could be compromised internally. So yes, I think the current state of affairs (especially open-source) is that people download and run possibly-compromised code all the time. But I'm not sure that lack of tool support is really the limiting factor. Or that it has turned out to be all that big a problem in practice. Anyway. As far as your solution goes, I'll admit I skimmed over the details, but it looks like basically a sequence of signatures producing a chain of state (so the tip state is signed, but you can also make sure the chain connects from your current state to what the server claims is the new tip state, and not a replay of some old state). Please correct me if that's not accurate. :) Without having thought too hard about it, it seems like you could do the same thing with push certs, as they have both a "before" and "after" for each ref. So if in addition to fetching the refs from a server, I fetch all of the push certs, I should be able to walk the chain of push certs from the one at my current state, to the one at the tip state, making sure that each one builds on the last. There are two cases that I don't think that handles, but that I also don't see are handled in your solution: - if I am cloning for the first time, I have no "current" state to base the chain from. An attacker could serve me any old signed ref state, and I have no way to know that it's old (except perhaps by seeing the wall-clock timestamp and comparing it to my clock; this isn't a proof but may be cause for suspicion if it's too old) - if there is a chain of signatures, the attacker must follow the chain, but they can always withhold links from the end. So imagine a repository has held a sequence of signed states (A, B, C), that B has a bug, C has the fix, and I am at A. An attacker can serve me B and I cannot know without out-of-band information that it is not the correct tip (because until C was created, it _was_ the correct tip). I think this is actually a generalization of the cloning issue (where state "A" is simply "I have no existing state yet"). So it seems like there is room for better tooling around push-certs (e.g., to fetch and verify the chaining automatically). I think git in general is quite weak in automatic tooling for verifications. There are room for signatures in the data format and tools for checking that the bytes haven't been touched, but there's almost nothing to tell you that signatures make any sense, tools for handling trust, etc. I think your solution also had some mechanisms for adding trusted keys as part of the hash chain. I'm not convinced that's something that should be part of git's solution in particular, and not an out-of-band thing handled as part of the PKI. Because it's really a group key management problem, and applies to anything you might sign. -Peff