[OT] USENIX paper on Git

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* [OT] USENIX paper on Git
       [not found] <20160801224043.4qmf56pmv27riq4i@LykOS.localdomain>
@ 2016-08-03 14:58 ` Santiago Torres
  2016-08-03 15:22   ` Johannes Schindelin
  2016-08-03 17:11   ` Jeff King
  0 siblings, 2 replies; 15+ messages in thread
From: Santiago Torres @ 2016-08-03 14:58 UTC (permalink / raw)
  To: Git

Hello everyone,

I will be presenting a paper regarding the Git metadata issues that we
discussed at the beginning on the year on USENIX '16. I'm writing To
make everyone in this ML aware that this work exists and to bring
everyone into the loop.

I'm open for feedback and corrections. If anything seems odd imprecise
to the community, I can make an errata in the presentation (at least).
I'll also try to work towards making corrections anywhere if possible;
this is my first publication, so I wasn't sure if it was possible to
share things before they are published. Thankfully, this is OK in
USENIX's book. Here's the link:
http://i2.cdn.turner.com/cnnnext/dam/assets/160730192650-14new-week-in-politics-super-169.jpg

I do mention of work towards fixing these issues in upcoming versions of
Git. This is in reference to the issue with Git tags, although I hope to
continue working on Git in general once I have more time for it. Thanks
again for all the patience reviewing patches and discussing everything.

Thanks!
-Santiago.

P.S. Let me know if anyone is going to USENIX. I'm looking forward to
meeting!

[1] http://thread.gmane.org/gmane.comp.version-control.git/287649 *
I believe it to be this, but gmane seems to be down. 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [OT] USENIX paper on Git
  2016-08-03 14:58 ` [OT] USENIX paper on Git Santiago Torres
@ 2016-08-03 15:22   ` Johannes Schindelin
  2016-08-03 15:25     ` Santiago Torres
  2016-08-03 17:11   ` Jeff King
  1 sibling, 1 reply; 15+ messages in thread
From: Johannes Schindelin @ 2016-08-03 15:22 UTC (permalink / raw)
  To: Santiago Torres; +Cc: Git

Hi Santiago,

On Wed, 3 Aug 2016, Santiago Torres wrote:

> I'm open for feedback and corrections. If anything seems odd imprecise
> to the community, I can make an errata in the presentation (at least).
> I'll also try to work towards making corrections anywhere if possible;
> this is my first publication, so I wasn't sure if it was possible to
> share things before they are published. Thankfully, this is OK in
> USENIX's book. Here's the link:
> http://i2.cdn.turner.com/cnnnext/dam/assets/160730192650-14new-week-in-politics-super-169.jpg

While I had a good laugh, I am wondering whether this is the correct link?

Ciao,
Johannes

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [OT] USENIX paper on Git
  2016-08-03 15:22   ` Johannes Schindelin
@ 2016-08-03 15:25     ` Santiago Torres
  2016-08-03 17:14       ` Stefan Beller
  0 siblings, 1 reply; 15+ messages in thread
From: Santiago Torres @ 2016-08-03 15:25 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Git

 > share things before they are published. Thankfully, this is OK in
> > USENIX's book. Here's the link:
> > http://i2.cdn.turner.com/cnnnext/dam/assets/160730192650-14new-week-in-politics-super-169.jpg
> 
> While I had a good laugh, I am wondering whether this is the correct link?

Oh my god, sorry, I meant to p, not to ctrl + v. My head is all over the
place as of late.

Here's the correct link:

http://isis.poly.edu/~jcappos/papers/torres_toto_usenixsec-2016.pdf

Thanks!
-Santiago.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [OT] USENIX paper on Git
  2016-08-03 14:58 ` [OT] USENIX paper on Git Santiago Torres
  2016-08-03 15:22   ` Johannes Schindelin
@ 2016-08-03 17:11   ` Jeff King
  2016-08-03 17:18     ` Junio C Hamano
  2016-08-03 17:45     ` Santiago Torres
  1 sibling, 2 replies; 15+ messages in thread
From: Jeff King @ 2016-08-03 17:11 UTC (permalink / raw)
  To: Santiago Torres; +Cc: Git

On Wed, Aug 03, 2016 at 10:58:31AM -0400, Santiago Torres wrote:

> I will be presenting a paper regarding the Git metadata issues that we
> discussed at the beginning on the year on USENIX '16. I'm writing To
> make everyone in this ML aware that this work exists and to bring
> everyone into the loop.
> 
> I'm open for feedback and corrections. If anything seems odd imprecise
> to the community, I can make an errata in the presentation (at least).
> I'll also try to work towards making corrections anywhere if possible;
> this is my first publication, so I wasn't sure if it was possible to
> share things before they are published. Thankfully, this is OK in
> USENIX's book. Here's the link:

I read it over. As far as technical descriptions of Git, it looked OK.
I found a few minor nits, but nothing worth caring about (e.g., ref
storage is not quite so simple these days as 40 bytes in a file, but
there is no point describing the whole packed-refs scheme in your paper,
as it does not change anything with respect to your work).

Here are my comments on the work itself. They're critical, but meant in
a friendly way. :)

As far as the attack goes, I'm still not convinced this is all that
_interesting_ an attack in the real world. What it boils down to is: the
ref state is not signed or authenticated in any way, so somebody who can
compromise your server repo or do a MiTM can lie about where the refs
are (even if individual commits are signed).

So if you want to treat Git as a cryptographic end-to-end distribution
mechanism, then no, it fails horribly at that. But the state of the art
in source code distribution, no matter which system you use, is much
less advanced than that. People download tarballs, even ones with GPG
signatures, all the time without verifying their contents. Most packages
distribute a sha1sum or similar (sometimes even gpg-signed), but quite
often the source of authority is questionable.

For example, consider somebody downloading a new package for the first
time. They don't know the author in any out-of-band way, so any
signatures are likely meaningless. They _might_ be depending on the
source domain for some security (and using some hierarchical PKI like
TLS+x.509 to be sure they're talking to that domain), but in your threat
model, even well-known hosts like FSF could be compromised internally.

So yes, I think the current state of affairs (especially open-source) is
that people download and run possibly-compromised code all the time. But
I'm not sure that lack of tool support is really the limiting factor. Or
that it has turned out to be all that big a problem in practice.

Anyway. As far as your solution goes, I'll admit I skimmed over the
details, but it looks like basically a sequence of signatures producing
a chain of state (so the tip state is signed, but you can also make sure
the chain connects from your current state to what the server claims is
the new tip state, and not a replay of some old state). Please correct
me if that's not accurate. :)

Without having thought too hard about it, it seems like you could do the
same thing with push certs, as they have both a "before" and "after" for
each ref. So if in addition to fetching the refs from a server, I fetch
all of the push certs, I should be able to walk the chain of push certs
from the one at my current state, to the one at the tip state, making
sure that each one builds on the last.

There are two cases that I don't think that handles, but that I also
don't see are handled in your solution:

  - if I am cloning for the first time, I have no "current" state to
    base the chain from. An attacker could serve me any old signed ref
    state, and I have no way to know that it's old (except perhaps by
    seeing the wall-clock timestamp and comparing it to my clock; this
    isn't a proof but may be cause for suspicion if it's too old)

  - if there is a chain of signatures, the attacker must follow the
    chain, but they can always withhold links from the end. So imagine a
    repository has held a sequence of signed states (A, B, C), that B
    has a bug, C has the fix, and I am at A. An attacker can serve me B
    and I cannot know without out-of-band information that it is not the
    correct tip (because until C was created, it _was_ the correct tip).

    I think this is actually a generalization of the cloning issue
    (where state "A" is simply "I have no existing state yet").

So it seems like there is room for better tooling around push-certs
(e.g., to fetch and verify the chaining automatically). I think git in
general is quite weak in automatic tooling for verifications. There are
room for signatures in the data format and tools for checking that the
bytes haven't been touched, but there's almost nothing to tell you that
signatures make any sense, tools for handling trust, etc.

I think your solution also had some mechanisms for adding trusted keys
as part of the hash chain. I'm not convinced that's something that
should be part of git's solution in particular, and not an out-of-band
thing handled as part of the PKI. Because it's really a group key
management problem, and applies to anything you might sign.

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [OT] USENIX paper on Git
  2016-08-03 15:25     ` Santiago Torres
@ 2016-08-03 17:14       ` Stefan Beller
  2016-08-03 17:22         ` Santiago Torres
  0 siblings, 1 reply; 15+ messages in thread
From: Stefan Beller @ 2016-08-03 17:14 UTC (permalink / raw)
  To: Santiago Torres; +Cc: Johannes Schindelin, Git

On Wed, Aug 3, 2016 at 8:25 AM, Santiago Torres <santiago@nyu.edu> wrote:
>  > share things before they are published. Thankfully, this is OK in
>> > USENIX's book. Here's the link:
>> > http://i2.cdn.turner.com/cnnnext/dam/assets/160730192650-14new-week-in-politics-super-169.jpg
>>
>> While I had a good laugh, I am wondering whether this is the correct link?
>
> Oh my god, sorry, I meant to p, not to ctrl + v. My head is all over the
> place as of late.
>
> Here's the correct link:
>
> http://isis.poly.edu/~jcappos/papers/torres_toto_usenixsec-2016.pdf

In 4.1 you write:
> Finally, Git submodules are also vulnerable, as they automatically track
> a tag (or branch). If a build dependency is included in a project as a part
> of the submodule, a package might be vulnerable via an underlying library.

Submodules actually track commits, not tags or branches.

This is confusing for some users, e.g. the user intended to track
a library at version 1.1, but it tracks 1234abcd instead (which is what
1.1 points at).

Thanks,
Stefan

>
> Thanks!
> -Santiago.
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [OT] USENIX paper on Git
  2016-08-03 17:11   ` Jeff King
@ 2016-08-03 17:18     ` Junio C Hamano
  2016-08-03 17:45     ` Santiago Torres
  1 sibling, 0 replies; 15+ messages in thread
From: Junio C Hamano @ 2016-08-03 17:18 UTC (permalink / raw)
  To: Jeff King; +Cc: Santiago Torres, Git

Jeff King <peff@peff.net> writes:

> Here are my comments on the work itself. They're critical, but meant in
> a friendly way. :)

A tl;dr version of your analysis seems to me that "you solve it the
same way as the push certificate solves it (including the limitation
the latter has)".

If that is the case, I think the solution presented in the paper is
a good one ;-).

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [OT] USENIX paper on Git
  2016-08-03 17:14       ` Stefan Beller
@ 2016-08-03 17:22         ` Santiago Torres
  2016-08-03 17:35           ` Stefan Beller
  2016-08-03 17:35           ` Junio C Hamano
  0 siblings, 2 replies; 15+ messages in thread
From: Santiago Torres @ 2016-08-03 17:22 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Johannes Schindelin, Git

On Wed, Aug 03, 2016 at 10:14:21AM -0700, Stefan Beller wrote:
> On Wed, Aug 3, 2016 at 8:25 AM, Santiago Torres <santiago@nyu.edu> wrote:
> >  > share things before they are published. Thankfully, this is OK in
> >> > USENIX's book. Here's the link:
> >> > http://i2.cdn.turner.com/cnnnext/dam/assets/160730192650-14new-week-in-politics-super-169.jpg
> >>
> >> While I had a good laugh, I am wondering whether this is the correct link?
> >
> > Oh my god, sorry, I meant to p, not to ctrl + v. My head is all over the
> > place as of late.
> >
> > Here's the correct link:
> >
> > http://isis.poly.edu/~jcappos/papers/torres_toto_usenixsec-2016.pdf
> 
> In 4.1 you write:
> > Finally, Git submodules are also vulnerable, as they automatically track
> > a tag (or branch). If a build dependency is included in a project as a part
> > of the submodule, a package might be vulnerable via an underlying library.
> 
> Submodules actually track commits, not tags or branches.
> 
> This is confusing for some users, e.g. the user intended to track
> a library at version 1.1, but it tracks 1234abcd instead (which is what
> 1.1 points at).

I'm assuming that git submodule update does update where the ref points
to, does it not?

let me dig into this and try to take the necessary measures to correct
this

Thanks for the feedback!

-Santiago.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [OT] USENIX paper on Git
  2016-08-03 17:22         ` Santiago Torres
@ 2016-08-03 17:35           ` Stefan Beller
  2016-08-03 18:02             ` Santiago Torres
  2016-08-03 17:35           ` Junio C Hamano
  1 sibling, 1 reply; 15+ messages in thread
From: Stefan Beller @ 2016-08-03 17:35 UTC (permalink / raw)
  To: Santiago Torres; +Cc: Johannes Schindelin, Git

On Wed, Aug 3, 2016 at 10:22 AM, Santiago Torres <santiago@nyu.edu> wrote:
> On Wed, Aug 03, 2016 at 10:14:21AM -0700, Stefan Beller wrote:
>> On Wed, Aug 3, 2016 at 8:25 AM, Santiago Torres <santiago@nyu.edu> wrote:
>> >  > share things before they are published. Thankfully, this is OK in
>> >> > USENIX's book. Here's the link:
>> >> > http://i2.cdn.turner.com/cnnnext/dam/assets/160730192650-14new-week-in-politics-super-169.jpg
>> >>
>> >> While I had a good laugh, I am wondering whether this is the correct link?
>> >
>> > Oh my god, sorry, I meant to p, not to ctrl + v. My head is all over the
>> > place as of late.
>> >
>> > Here's the correct link:
>> >
>> > http://isis.poly.edu/~jcappos/papers/torres_toto_usenixsec-2016.pdf
>>
>> In 4.1 you write:
>> > Finally, Git submodules are also vulnerable, as they automatically track
>> > a tag (or branch). If a build dependency is included in a project as a part
>> > of the submodule, a package might be vulnerable via an underlying library.
>>
>> Submodules actually track commits, not tags or branches.
>>
>> This is confusing for some users, e.g. the user intended to track
>> a library at version 1.1, but it tracks 1234abcd instead (which is what
>> 1.1 points at).
>
> I'm assuming that git submodule update does update where the ref points
> to, does it not?
>
> let me dig into this and try to take the necessary measures to correct
> this
>

"git submodule update" updates to the recorded sha1, which I assume is used
by downstream users. If you are a maintainer and  you want to update the
library used, you'd be interested in have "git submodule update
--remote" to update
the sha1 to the tracking branch, which then exposes the attacks presented.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [OT] USENIX paper on Git
  2016-08-03 17:22         ` Santiago Torres
  2016-08-03 17:35           ` Stefan Beller
@ 2016-08-03 17:35           ` Junio C Hamano
  2016-08-03 17:58             ` Santiago Torres
  1 sibling, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2016-08-03 17:35 UTC (permalink / raw)
  To: Santiago Torres; +Cc: Stefan Beller, Johannes Schindelin, Git

Santiago Torres <santiago@nyu.edu> writes:

>> Submodules actually track commits, not tags or branches.
>> 
>> This is confusing for some users, e.g. the user intended to track
>> a library at version 1.1, but it tracks 1234abcd instead (which is what
>> 1.1 points at).
>
> I'm assuming that git submodule update does update where the ref points
> to, does it not?

I think you may configure the command to do so, instead of the
default "detach at the commit recorded in the superproject".

But then your tree immediately will be marked by "git status" as
"modified" at such a submodule, meaning "what you have in the
working tree is different from what the commit in the superproject
wants you to have", I would think.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [OT] USENIX paper on Git
  2016-08-03 17:11   ` Jeff King
  2016-08-03 17:18     ` Junio C Hamano
@ 2016-08-03 17:45     ` Santiago Torres
  2016-08-03 17:58       ` Jeff King
  2016-08-03 20:03       ` David Lang
  1 sibling, 2 replies; 15+ messages in thread
From: Santiago Torres @ 2016-08-03 17:45 UTC (permalink / raw)
  To: Jeff King; +Cc: Git

Hello,

> Here are my comments on the work itself. They're critical, but meant in
> a friendly way. :)
> 

Thanks! If anything, the community here has been incredibly helpful in
helping me understand everything.

> As far as the attack goes, I'm still not convinced this is all that
> _interesting_ an attack in the real world. What it boils down to is: the
> ref state is not signed or authenticated in any way, so somebody who can
> compromise your server repo or do a MiTM can lie about where the refs
> are (even if individual commits are signed).

Yup, that's pretty much it. I do agree that some of these attacks feel
tangential in the way it should be used. I also agree that, if git is
used how the Linux kernel or git.git does, then these attacks are rather
hard to pull off. 

> 
> So if you want to treat Git as a cryptographic end-to-end distribution
> mechanism, then no, it fails horribly at that. But the state of the art
> in source code distribution, no matter which system you use, is much
> less advanced than that. People download tarballs, even ones with GPG
> signatures, all the time without verifying their contents. Most packages
> distribute a sha1sum or similar (sometimes even gpg-signed), but quite
> often the source of authority is questionable.

Yes, this happens an awful lot of times. We did some work with python's
pypi last year, and we found out that less than 1% of people actually
downloaded the gpg signature for the package they are retrieving[1].

> 
> For example, consider somebody downloading a new package for the first
> time. They don't know the author in any out-of-band way, so any
> signatures are likely meaningless. They _might_ be depending on the
> source domain for some security (and using some hierarchical PKI like
> TLS+x.509 to be sure they're talking to that domain), but in your threat
> model, even well-known hosts like FSF could be compromised internally.
> 
> So yes, I think the current state of affairs (especially open-source) is
> that people download and run possibly-compromised code all the time. But
> I'm not sure that lack of tool support is really the limiting factor. Or
> that it has turned out to be all that big a problem in practice.

I couldn't agree more. I feel that OSS is slowly moving towards a more
cryptographically robust, trust-based way of doing things, which I find
pleasing.

> 
> Anyway. As far as your solution goes, I'll admit I skimmed over the
> details, but it looks like basically a sequence of signatures producing
> a chain of state (so the tip state is signed, but you can also make sure
> the chain connects from your current state to what the server claims is
> the new tip state, and not a replay of some old state). Please correct
> me if that's not accurate. :)

Yeah, this sounds about right.

> 
> Without having thought too hard about it, it seems like you could do the
> same thing with push certs, as they have both a "before" and "after" for
> each ref. So if in addition to fetching the refs from a server, I fetch
> all of the push certs, I should be able to walk the chain of push certs
> from the one at my current state, to the one at the tip state, making
> sure that each one builds on the last.

Yeah, when we looked at push certs I actually thought that "chaining" the
certs could achieve a similar effect to the solution I described.

> 
> There are two cases that I don't think that handles, but that I also
> don't see are handled in your solution:
> 
>   - if I am cloning for the first time, I have no "current" state to
>     base the chain from. An attacker could serve me any old signed ref
>     state, and I have no way to know that it's old (except perhaps by
>     seeing the wall-clock timestamp and comparing it to my clock; this
>     isn't a proof but may be cause for suspicion if it's too old)

Hmm, I didn't think about it. Let me jot it down. This sounds
interesting. I recall you mentioning metadata expiration back then.

> 
>   - if there is a chain of signatures, the attacker must follow the
>     chain, but they can always withhold links from the end. So imagine a
>     repository has held a sequence of signed states (A, B, C), that B
>     has a bug, C has the fix, and I am at A. An attacker can serve me B
>     and I cannot know without out-of-band information that it is not the
>     correct tip (because until C was created, it _was_ the correct tip).

I think we address this by using the "nonce bag". We basically force the
server to fork the user's history if it withholds changes from on group
to the other. By doing so, the user's nonce can't be added to any other
history. I don't think this is noticeable from start though.

> 
>     I think this is actually a generalization of the cloning issue
>     (where state "A" is simply "I have no existing state yet").
> 
> So it seems like there is room for better tooling around push-certs
> (e.g., to fetch and verify the chaining automatically). 

Yeah, I think that in-band cert distribution and automatic validation
may be a desirable feature, but there may be reasons as to why this is
not appealing. (I already have some in mind).

> I think git in general is quite weak in automatic tooling for
> verifications. There are room for signatures in the data format and
> tools for checking that the bytes haven't been touched, but there's
> almost nothing to tell you that signatures make any sense, tools for
> handling trust, etc.

Yes, from our previous interactions, it seems that git's philosophy
focuses on providing the right information to users/tools and let those
tools make the call of whether something is fishy. I don't think this is
necessarily bad.

> 
> I think your solution also had some mechanisms for adding trusted keys
> as part of the hash chain. I'm not convinced that's something that
> should be part of git's solution in particular, and not an out-of-band
> thing handled as part of the PKI. Because it's really a group key
> management problem, and applies to anything you might sign.

I see. What about, for example, having an official "overlay" on git for
signing and verification of a repository? (e.g., similar to what
monotone does). I see that other VCS's have a plugin mechanism, and they
host official plugins.

Thanks a lot!
-Santiago.

[1] https://isis.poly.edu/~jcappos/papers/kuppusamy_nsdi_16.pdf

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [OT] USENIX paper on Git
  2016-08-03 17:35           ` Junio C Hamano
@ 2016-08-03 17:58             ` Santiago Torres
  0 siblings, 0 replies; 15+ messages in thread
From: Santiago Torres @ 2016-08-03 17:58 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Beller, Johannes Schindelin, Git

On Wed, Aug 03, 2016 at 10:35:54AM -0700, Junio C Hamano wrote:
> Santiago Torres <santiago@nyu.edu> writes:
> 
> >> Submodules actually track commits, not tags or branches.
> >> 
> >> This is confusing for some users, e.g. the user intended to track a
> >> library at version 1.1, but it tracks 1234abcd instead (which is
> >> what 1.1 points at).
> >
> > I'm assuming that git submodule update does update where the ref
> > points to, does it not?
> 
> I think you may configure the command to do so, instead of the default
> "detach at the commit recorded in the superproject".
> 
> But then your tree immediately will be marked by "git status" as
> "modified" at such a submodule, meaning "what you have in the working
> tree is different from what the commit in the superproject wants you
> to have", I would think.
> 

Ah, I see where is my confusion. Thanks for the correction :)

-Santiago.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [OT] USENIX paper on Git
  2016-08-03 17:45     ` Santiago Torres
@ 2016-08-03 17:58       ` Jeff King
  2016-08-03 18:31         ` Santiago Torres
  2016-08-03 20:03       ` David Lang
  1 sibling, 1 reply; 15+ messages in thread
From: Jeff King @ 2016-08-03 17:58 UTC (permalink / raw)
  To: Santiago Torres; +Cc: Git

On Wed, Aug 03, 2016 at 01:45:00PM -0400, Santiago Torres wrote:

> >   - if there is a chain of signatures, the attacker must follow the
> >     chain, but they can always withhold links from the end. So imagine a
> >     repository has held a sequence of signed states (A, B, C), that B
> >     has a bug, C has the fix, and I am at A. An attacker can serve me B
> >     and I cannot know without out-of-band information that it is not the
> >     correct tip (because until C was created, it _was_ the correct tip).
> 
> I think we address this by using the "nonce bag". We basically force the
> server to fork the user's history if it withholds changes from on group
> to the other. By doing so, the user's nonce can't be added to any other
> history. I don't think this is noticeable from start though.

OK, I think that is in the details I glossed over. ;)

If you are effectively preventing the server from showing different
states to different people, then at least that lets the "main"
developers notice problems (because at least one of them already saw "C"
because they wrote it).

> > I think git in general is quite weak in automatic tooling for
> > verifications. There are room for signatures in the data format and
> > tools for checking that the bytes haven't been touched, but there's
> > almost nothing to tell you that signatures make any sense, tools for
> > handling trust, etc.
> 
> Yes, from our previous interactions, it seems that git's philosophy
> focuses on providing the right information to users/tools and let those
> tools make the call of whether something is fishy. I don't think this is
> necessarily bad.

I think it's half philosophy (git strives for flexibility, and so aims
to provide low-level tooling that you can build on), and half that
nobody has bothered to implement a sane set of automatic checks.

There's definitely some low-hanging fruit there. I think we've discussed
things like checking that verifying refs/tags/v1.0.0 actually gets you a
tag that says "v1.0.0" in it. But I'd love to see a framework either
built into or on top of git that would implement sensible policies, and
make out-of-the-box verification easy to do. Then people might actually
use it. :)

> > I think your solution also had some mechanisms for adding trusted keys
> > as part of the hash chain. I'm not convinced that's something that
> > should be part of git's solution in particular, and not an out-of-band
> > thing handled as part of the PKI. Because it's really a group key
> > management problem, and applies to anything you might sign.
> 
> I see. What about, for example, having an official "overlay" on git for
> signing and verification of a repository? (e.g., similar to what
> monotone does). I see that other VCS's have a plugin mechanism, and they
> host official plugins.

In general, if something is more general than git, I'd like to see a
general tool address it, and then add support to git to make use of the
tool.

For group key management, I specifically was wondering if you could do
something like:

  - start with some seed GPG keys for the project

  - existing keys can sign or revoke certificates to add or remove other
    keys to/from the project; you could even require a threshold of
    signatures, etc.

  - those keys could be used for signing git pushes, but also for other
    things, like signing tarballs, used as encryption keys for sending
    for-developers-eyes-only security reports, etc

    You'd want a tool that asks not just "is this signed" but "is this
    signed _by a key that is valid for this project_".

And then git support would just consist of feeding signatures to
"gpg-group --project=..." instead of "gpg". Management of the group
would be out-of-band from git, which is in some ways good and in some
ways bad.

I think a naive implementation would be pretty easy, but I've glossed
over all of the chaining properties we've discussed. So whatever
mechanism you use to receive updates to the key-group would have all the
same problems (e.g., withholding revocations of compromised keys). It's
still orders of magnitude ahead of what's currently happening
day-to-day. :)

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [OT] USENIX paper on Git
  2016-08-03 17:35           ` Stefan Beller
@ 2016-08-03 18:02             ` Santiago Torres
  0 siblings, 0 replies; 15+ messages in thread
From: Santiago Torres @ 2016-08-03 18:02 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Johannes Schindelin, Git

On Wed, Aug 03, 2016 at 10:35:39AM -0700, Stefan Beller wrote:
> On Wed, Aug 3, 2016 at 10:22 AM, Santiago Torres <santiago@nyu.edu> wrote:
> > On Wed, Aug 03, 2016 at 10:14:21AM -0700, Stefan Beller wrote:
> >> On Wed, Aug 3, 2016 at 8:25 AM, Santiago Torres <santiago@nyu.edu> wrote:
> >> >  > share things before they are published. Thankfully, this is OK in
> >> >> > USENIX's book. Here's the link:
> >> >> > http://i2.cdn.turner.com/cnnnext/dam/assets/160730192650-14new-week-in-politics-super-169.jpg
> >> >>
> >> >> While I had a good laugh, I am wondering whether this is the correct link?
> >> >
> >> > Oh my god, sorry, I meant to p, not to ctrl + v. My head is all over the
> >> > place as of late.
> >> >
> >> > Here's the correct link:
> >> >
> >> > http://isis.poly.edu/~jcappos/papers/torres_toto_usenixsec-2016.pdf
> >>
> >> In 4.1 you write:
> >> > Finally, Git submodules are also vulnerable, as they automatically track
> >> > a tag (or branch). If a build dependency is included in a project as a part
> >> > of the submodule, a package might be vulnerable via an underlying library.
> >>
> >> Submodules actually track commits, not tags or branches.
> >>
> >> This is confusing for some users, e.g. the user intended to track
> >> a library at version 1.1, but it tracks 1234abcd instead (which is what
> >> 1.1 points at).
> >
> > I'm assuming that git submodule update does update where the ref points
> > to, does it not?
> >
> > let me dig into this and try to take the necessary measures to correct
> > this
> >
> 
> "git submodule update" updates to the recorded sha1, which I assume is used
> by downstream users. If you are a maintainer and  you want to update the
> library used, you'd be interested in have "git submodule update
> --remote" to update
> the sha1 to the tracking branch, which then exposes the attacks presented.

I see, I just tried to reproduce this, and it seems that, with a simple
git clone --recursive [path], the submodule fetched does not update to
the "malicious ref." You're right.

So, in the end, git submodule update --remote also requires you to
create a new tree, right? Then this attack wouldn't be possible by just
fiddling with the refs if signing is in place, right?

Thanks for clarifying!
-Santiago.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [OT] USENIX paper on Git
  2016-08-03 17:58       ` Jeff King
@ 2016-08-03 18:31         ` Santiago Torres
  0 siblings, 0 replies; 15+ messages in thread
From: Santiago Torres @ 2016-08-03 18:31 UTC (permalink / raw)
  To: Jeff King; +Cc: Git

On Wed, Aug 03, 2016 at 01:58:54PM -0400, Jeff King wrote:
> On Wed, Aug 03, 2016 at 01:45:00PM -0400, Santiago Torres wrote:
> 
> > >   - if there is a chain of signatures, the attacker must follow the
> > >     chain, but they can always withhold links from the end. So imagine a
> > >     repository has held a sequence of signed states (A, B, C), that B
> > >     has a bug, C has the fix, and I am at A. An attacker can serve me B
> > >     and I cannot know without out-of-band information that it is not the
> > >     correct tip (because until C was created, it _was_ the correct tip).
> > 
> > I think we address this by using the "nonce bag". We basically force the
> > server to fork the user's history if it withholds changes from on group
> > to the other. By doing so, the user's nonce can't be added to any other
> > history. I don't think this is noticeable from start though.
> 
> OK, I think that is in the details I glossed over. ;)
> 
> If you are effectively preventing the server from showing different
> states to different people, then at least that lets the "main"
> developers notice problems (because at least one of them already saw "C"
> because they wrote it).

yeah, that was one of our assumptions. I think it's unrealistic to think
that people do not coordinate over mailing lists or other means.

> 
> > > I think git in general is quite weak in automatic tooling for
> > > verifications. There are room for signatures in the data format and
> > > tools for checking that the bytes haven't been touched, but there's
> > > almost nothing to tell you that signatures make any sense, tools for
> > > handling trust, etc.
> > 
> > Yes, from our previous interactions, it seems that git's philosophy
> > focuses on providing the right information to users/tools and let those
> > tools make the call of whether something is fishy. I don't think this is
> > necessarily bad.
> 
> I think it's half philosophy (git strives for flexibility, and so aims
> to provide low-level tooling that you can build on), and half that
> nobody has bothered to implement a sane set of automatic checks.
> 
> There's definitely some low-hanging fruit there. I think we've discussed
> things like checking that verifying refs/tags/v1.0.0 actually gets you a
> tag that says "v1.0.0" in it. But I'd love to see a framework either
> built into or on top of git that would implement sensible policies, and
> make out-of-the-box verification easy to do. Then people might actually
> use it. :)

Yeah, that's one of the long-term goals with my PhD, but it's still on
the early stages, and I don't have much done yet in that field. I can of
course share this around once it's more mature if that's ok with people
in here :)

> 
> > > I think your solution also had some mechanisms for adding trusted keys
> > > as part of the hash chain. I'm not convinced that's something that
> > > should be part of git's solution in particular, and not an out-of-band
> > > thing handled as part of the PKI. Because it's really a group key
> > > management problem, and applies to anything you might sign.
> > 
> > I see. What about, for example, having an official "overlay" on git for
> > signing and verification of a repository? (e.g., similar to what
> > monotone does). I see that other VCS's have a plugin mechanism, and they
> > host official plugins.
> 
> In general, if something is more general than git, I'd like to see a
> general tool address it, and then add support to git to make use of the
> tool.
> 
> For group key management, I specifically was wondering if you could do
> something like:
> 
>   - start with some seed GPG keys for the project
> 
>   - existing keys can sign or revoke certificates to add or remove other
>     keys to/from the project; you could even require a threshold of
>     signatures, etc.
> 
>   - those keys could be used for signing git pushes, but also for other
>     things, like signing tarballs, used as encryption keys for sending
>     for-developers-eyes-only security reports, etc
> 
>     You'd want a tool that asks not just "is this signed" but "is this
>     signed _by a key that is valid for this project_".

Yep and also "is this signed thing the thing I should be looking at?"

> 
> And then git support would just consist of feeding signatures to
> "gpg-group --project=..." instead of "gpg". Management of the group
> would be out-of-band from git, which is in some ways good and in some
> ways bad.

yep, what I like from in-band solution is that it is easy to piggyback
on existing git mechanisms (e.g., git ref backend). 

> 
> I think a naive implementation would be pretty easy, but I've glossed
> over all of the chaining properties we've discussed. So whatever
> mechanism you use to receive updates to the key-group would have all the
> same problems (e.g., withholding revocations of compromised keys). It's
> still orders of magnitude ahead of what's currently happening
> day-to-day. :)

yeah, a naive implementation may be easy, but there are little details
to consider when trying to get something more robust. I don't see why
sketching something would be a bad idea though.

Thanks all for all your feedback! I'll keep this mail archived around to
revisit it in the future; all of this seems really interesting/helpful. 

-Santiago.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [OT] USENIX paper on Git
  2016-08-03 17:45     ` Santiago Torres
  2016-08-03 17:58       ` Jeff King
@ 2016-08-03 20:03       ` David Lang
  1 sibling, 0 replies; 15+ messages in thread
From: David Lang @ 2016-08-03 20:03 UTC (permalink / raw)
  To: Santiago Torres; +Cc: Jeff King, Git

On Wed, 3 Aug 2016, Santiago Torres wrote:

>> So if you want to treat Git as a cryptographic end-to-end distribution
>> mechanism, then no, it fails horribly at that. But the state of the art
>> in source code distribution, no matter which system you use, is much
>> less advanced than that. People download tarballs, even ones with GPG
>> signatures, all the time without verifying their contents. Most packages
>> distribute a sha1sum or similar (sometimes even gpg-signed), but quite
>> often the source of authority is questionable.
>
> Yes, this happens an awful lot of times. We did some work with python's
> pypi last year, and we found out that less than 1% of people actually
> downloaded the gpg signature for the package they are retrieving[1].
>
>>
>> For example, consider somebody downloading a new package for the first
>> time. They don't know the author in any out-of-band way, so any
>> signatures are likely meaningless. They _might_ be depending on the
>> source domain for some security (and using some hierarchical PKI like
>> TLS+x.509 to be sure they're talking to that domain), but in your threat
>> model, even well-known hosts like FSF could be compromised internally.
>>
>> So yes, I think the current state of affairs (especially open-source) is
>> that people download and run possibly-compromised code all the time. But
>> I'm not sure that lack of tool support is really the limiting factor. Or
>> that it has turned out to be all that big a problem in practice.
>
> I couldn't agree more. I feel that OSS is slowly moving towards a more
> cryptographically robust, trust-based way of doing things, which I find
> pleasing.

It's too easy to look at this from purely a technical, cryptographic point of 
view and miss a very important point.

It may be very easy to see that this was signed by "cool-internet-name" but how 
can I tell if this is really Joe Blow the developer? and if it is, I still have 
no way of knowing if he's working for the NSA or not.

The lack of meaningful termination of the signatures to the real world is why so 
few people bother to check package signatures, etc.

David Lang

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2016-08-03 20:03 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20160801224043.4qmf56pmv27riq4i@LykOS.localdomain>
2016-08-03 14:58 ` [OT] USENIX paper on Git Santiago Torres
2016-08-03 15:22   ` Johannes Schindelin
2016-08-03 15:25     ` Santiago Torres
2016-08-03 17:14       ` Stefan Beller
2016-08-03 17:22         ` Santiago Torres
2016-08-03 17:35           ` Stefan Beller
2016-08-03 18:02             ` Santiago Torres
2016-08-03 17:35           ` Junio C Hamano
2016-08-03 17:58             ` Santiago Torres
2016-08-03 17:11   ` Jeff King
2016-08-03 17:18     ` Junio C Hamano
2016-08-03 17:45     ` Santiago Torres
2016-08-03 17:58       ` Jeff King
2016-08-03 18:31         ` Santiago Torres
2016-08-03 20:03       ` David Lang

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).