git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* Preserving the ability to have both SHA1 and SHA256 signatures
@ 2021-05-08  2:22 dwh
  2021-05-08  6:39 ` Christian Couder
  2021-05-09  0:19 ` brian m. carlson
  0 siblings, 2 replies; 21+ messages in thread
From: dwh @ 2021-05-08  2:22 UTC (permalink / raw)
  To: git

Hi Everybody,

I was reading through the
Documentation/technical/hash-function-transition.txt doc and realized
that the plan is to support allowing BOTH SHA1 and SHA256 signatures to
exist in a single object:

> Signed Commits
> 1. using SHA-1 only, as in existing signed commit objects
> 2. using both SHA-1 and SHA-256, by using both gpgsig-sha256 and gpgsig
>   fields.
> 3. using only SHA-256, by only using the gpgsig-sha256 field.
>
> Signed Tags
> 1. using SHA-1 only, as in existing signed tag objects
> 2. using both SHA-1 and SHA-256, by using gpgsig-sha256 and an in-body
>   signature.
> 3. using only SHA-256, by only using the gpgsig-sha256 field.

The design that I'm working on only supports a single signature that
uses a combination of fields: one 'signtype', zero or more 'signoption'
and one 'sign' in objects. I am thinking that the best thing to do is
replace the gpgsig-sha256 fields in objects and allow old gpgsig (commits)
and in-body (tags) signatures to co-exist along side to give the same
functionality.

That not only paves the way forward but preserves the full backward
compatibility that is one of my top requirements.

Thoughts?

Cheers!
Dave

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Preserving the ability to have both SHA1 and SHA256 signatures
  2021-05-08  2:22 Preserving the ability to have both SHA1 and SHA256 signatures dwh
@ 2021-05-08  6:39 ` Christian Couder
  2021-05-08  6:56   ` Junio C Hamano
  2021-05-09  0:19 ` brian m. carlson
  1 sibling, 1 reply; 21+ messages in thread
From: Christian Couder @ 2021-05-08  6:39 UTC (permalink / raw)
  To: dwh; +Cc: git

Hi,

(Not sure why, but, when using "Reply to all" in Gmail, it doesn't
actually reply to you (or Cc you), only to the mailing list. I had to
manually add your email back.)

On Sat, May 8, 2021 at 4:25 AM <dwh@linuxprogrammer.org> wrote:
>
> Hi Everybody,
>
> I was reading through the
> Documentation/technical/hash-function-transition.txt doc and realized
> that the plan is to support allowing BOTH SHA1 and SHA256 signatures to
> exist in a single object:
>
> > Signed Commits
> > 1. using SHA-1 only, as in existing signed commit objects
> > 2. using both SHA-1 and SHA-256, by using both gpgsig-sha256 and gpgsig
> >   fields.
> > 3. using only SHA-256, by only using the gpgsig-sha256 field.
> >
> > Signed Tags
> > 1. using SHA-1 only, as in existing signed tag objects
> > 2. using both SHA-1 and SHA-256, by using gpgsig-sha256 and an in-body
> >   signature.
> > 3. using only SHA-256, by only using the gpgsig-sha256 field.
>
> The design that I'm working on only supports a single signature that
> uses a combination of fields: one 'signtype', zero or more 'signoption'
> and one 'sign' in objects.

Here I understand that your design doesn't support both a SHA1 and a
SHA256 signature.

> I am thinking that the best thing to do is
> replace the gpgsig-sha256 fields in objects and allow old gpgsig (commits)
> and in-body (tags) signatures to co-exist along side to give the same
> functionality.

Is this part of your design, or a, maybe temporary, alternative to it?

> That not only paves the way forward but preserves the full backward
> compatibility that is one of my top requirements.

There has been patches and discussions quite recently about this, that
have been reported on in our Git Rev News newsletter:

https://git.github.io/rev_news/2021/02/27/edition-72/

You can see that, with the latest patches (not sure the documentation
is up-to-date though), signing both commits and tags
 can now be round-tripped through both SHA-1 and SHA-256 conversions.
How isn't that fully backward compatible?

Best,
Christian.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Preserving the ability to have both SHA1 and SHA256 signatures
  2021-05-08  6:39 ` Christian Couder
@ 2021-05-08  6:56   ` Junio C Hamano
  2021-05-08  8:03     ` Felipe Contreras
  0 siblings, 1 reply; 21+ messages in thread
From: Junio C Hamano @ 2021-05-08  6:56 UTC (permalink / raw)
  To: Christian Couder; +Cc: dwh, git

Christian Couder <christian.couder@gmail.com> writes:

> Hi,
>
> (Not sure why, but, when using "Reply to all" in Gmail, it doesn't
> actually reply to you (or Cc you), only to the mailing list. I had to
> manually add your email back.)

I am sure why.  DWH, please do not use mail-follow-up-to when
working with this list.  It is rude and wastes people's time (like
the practice just did by stealing time from Christian).

Also cf.
https://lore.kernel.org/git/7v63l6f1mc.fsf@gitster.siamese.dyndns.org/
https://lore.kernel.org/git/7vk3zig92n.fsf@alter.siamese.dyndns.org/



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Preserving the ability to have both SHA1 and SHA256 signatures
  2021-05-08  6:56   ` Junio C Hamano
@ 2021-05-08  8:03     ` Felipe Contreras
  2021-05-08 10:11       ` Stefan Moch
  0 siblings, 1 reply; 21+ messages in thread
From: Felipe Contreras @ 2021-05-08  8:03 UTC (permalink / raw)
  To: Junio C Hamano, Christian Couder; +Cc: dwh, git

Junio C Hamano wrote:
> Christian Couder <christian.couder@gmail.com> writes:
> > (Not sure why, but, when using "Reply to all" in Gmail, it doesn't
> > actually reply to you (or Cc you), only to the mailing list. I had to
> > manually add your email back.)
> 
> I am sure why.  DWH, please do not use mail-follow-up-to when
> working with this list.  It is rude and wastes people's time (like
> the practice just did by stealing time from Christian).

I agree with this, but shouldn't this be written in some kind of mail
etiquiette guideline? Along with a rationale.

-- 
Felipe Contreras

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Preserving the ability to have both SHA1 and SHA256 signatures
  2021-05-08  8:03     ` Felipe Contreras
@ 2021-05-08 10:11       ` Stefan Moch
  2021-05-08 11:12         ` Junio C Hamano
  0 siblings, 1 reply; 21+ messages in thread
From: Stefan Moch @ 2021-05-08 10:11 UTC (permalink / raw)
  To: Felipe Contreras, Junio C Hamano, Christian Couder, dwh; +Cc: git

Felipe Contreras wrote:
> Junio C Hamano wrote:
>> Christian Couder <christian.couder@gmail.com> writes:
>>> (Not sure why, but, when using "Reply to all" in Gmail, it doesn't
>>> actually reply to you (or Cc you), only to the mailing list. I had to
>>> manually add your email back.)
>>
>> I am sure why.  DWH, please do not use mail-follow-up-to when
>> working with this list.  It is rude and wastes people's time (like
>> the practice just did by stealing time from Christian).
> 
> I agree with this, but shouldn't this be written in some kind of mail
> etiquiette guideline? Along with a rationale.

Good idea to write this down. How to use the mailing list is only
sparsely documented. The following files talk about sending to the
mailing list:

 1. README.md
 2. Documentation/SubmittingPatches
 3. Documentation/MyFirstContribution.txt
 4. MaintNotes (in Junio's “todo” branch, sent out to the list from
    time to time as “A note from the maintainer”)

2, 3 and 4 mention sending Cc to everyone involved.

2 is about new messages.

3 and 4 specifically talk about keeping everyone in Cc: in replies.
Both in the context of “you don't have to be subscribed and you
don't need to ask for Cc:”.


Please also note, that mutt sets the “Mail-Followup-To:” header by
default for sending to known mailing lists, unless “followup_to” is
set to “no”. Whether or not it removes the sender address in this
header depends on the list address to be known to be subscribed to
or simply known to be a mailing list. It also does not set this
header if no recipient address is known as a mailing list.

http://www.mutt.org/doc/manual/#followup-to
http://www.mutt.org/doc/manual/#using-lists

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Preserving the ability to have both SHA1 and SHA256 signatures
  2021-05-08 10:11       ` Stefan Moch
@ 2021-05-08 11:12         ` Junio C Hamano
  0 siblings, 0 replies; 21+ messages in thread
From: Junio C Hamano @ 2021-05-08 11:12 UTC (permalink / raw)
  To: Stefan Moch; +Cc: Felipe Contreras, Christian Couder, dwh, git

Stefan Moch <stefanmoch@mail.de> writes:

> Good idea to write this down. How to use the mailing list is only
> sparsely documented. The following files talk about sending to the
> mailing list:
>
>  1. README.md
>  2. Documentation/SubmittingPatches
>  3. Documentation/MyFirstContribution.txt
>  4. MaintNotes (in Junio's “todo” branch, sent out to the list from
>     time to time as “A note from the maintainer”)
>
> 2, 3 and 4 mention sending Cc to everyone involved.
>
> 2 is about new messages.
>
> 3 and 4 specifically talk about keeping everyone in Cc: in replies.
> Both in the context of “you don't have to be subscribed and you
> don't need to ask for Cc:”.

In case somebody wants to write a doc, a better pair of references
than what I quoted earlier to draw material from are:

https://public-inbox.org/git/7v4pndfjym.fsf@assigned-by-dhcp.cox.net/
https://public-inbox.org/git/7vei7zjr3y.fsf@alter.siamese.dyndns.org/


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Preserving the ability to have both SHA1 and SHA256 signatures
  2021-05-08  2:22 Preserving the ability to have both SHA1 and SHA256 signatures dwh
  2021-05-08  6:39 ` Christian Couder
@ 2021-05-09  0:19 ` brian m. carlson
  2021-05-10 12:22   ` Is the sha256 object format experimental or not? Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 21+ messages in thread
From: brian m. carlson @ 2021-05-09  0:19 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 1689 bytes --]

On 2021-05-08 at 02:22:25, dwh@linuxprogrammer.org wrote:
> Hi Everybody,
> 
> I was reading through the
> Documentation/technical/hash-function-transition.txt doc and realized
> that the plan is to support allowing BOTH SHA1 and SHA256 signatures to
> exist in a single object:
> 
> > Signed Commits
> > 1. using SHA-1 only, as in existing signed commit objects
> > 2. using both SHA-1 and SHA-256, by using both gpgsig-sha256 and gpgsig
> >   fields.
> > 3. using only SHA-256, by only using the gpgsig-sha256 field.
> > 
> > Signed Tags
> > 1. using SHA-1 only, as in existing signed tag objects
> > 2. using both SHA-1 and SHA-256, by using gpgsig-sha256 and an in-body
> >   signature.
> > 3. using only SHA-256, by only using the gpgsig-sha256 field.

Yes, this is the case.  We have tests for this case.

> The design that I'm working on only supports a single signature that
> uses a combination of fields: one 'signtype', zero or more 'signoption'
> and one 'sign' in objects. I am thinking that the best thing to do is
> replace the gpgsig-sha256 fields in objects and allow old gpgsig (commits)
> and in-body (tags) signatures to co-exist along side to give the same
> functionality.

You can't do that.  SHA-256 repositories already exist and that would
break compatibility.

> That not only paves the way forward but preserves the full backward
> compatibility that is one of my top requirements.

I've reviewed your proposed design and provided feedback that we need to
preserve this functionality in your new design as well.  People will
want to have that functionality.
-- 
brian m. carlson (he/him or they/them)
Houston, Texas, US

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Is the sha256 object format experimental or not?
  2021-05-09  0:19 ` brian m. carlson
@ 2021-05-10 12:22   ` Ævar Arnfjörð Bjarmason
  2021-05-10 22:42     ` brian m. carlson
  0 siblings, 1 reply; 21+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-05-10 12:22 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git


On Sun, May 09 2021, brian m. carlson wrote:

> [[PGP Signed Part:Undecided]]
> On 2021-05-08 at 02:22:25, dwh@linuxprogrammer.org wrote:
>> Hi Everybody,
>> 
>> I was reading through the
>> Documentation/technical/hash-function-transition.txt doc and realized
>> that the plan is to support allowing BOTH SHA1 and SHA256 signatures to
>> exist in a single object:
>> 
>> > Signed Commits
>> > 1. using SHA-1 only, as in existing signed commit objects
>> > 2. using both SHA-1 and SHA-256, by using both gpgsig-sha256 and gpgsig
>> >   fields.
>> > 3. using only SHA-256, by only using the gpgsig-sha256 field.
>> > 
>> > Signed Tags
>> > 1. using SHA-1 only, as in existing signed tag objects
>> > 2. using both SHA-1 and SHA-256, by using gpgsig-sha256 and an in-body
>> >   signature.
>> > 3. using only SHA-256, by only using the gpgsig-sha256 field.
>
> Yes, this is the case.  We have tests for this case.
>
>> The design that I'm working on only supports a single signature that
>> uses a combination of fields: one 'signtype', zero or more 'signoption'
>> and one 'sign' in objects. I am thinking that the best thing to do is
>> replace the gpgsig-sha256 fields in objects and allow old gpgsig (commits)
>> and in-body (tags) signatures to co-exist along side to give the same
>> functionality.
>
> You can't do that.  SHA-256 repositories already exist and that would
> break compatibility.

From memory this is at least the second time you've brought up this
point on-list.

My feeling is that almost nobody's using sha256 currently, and we have a
very prominent ALL CAPS warning saying the format is experimental and
may change, see ff233d8dda1 (Documentation: mark
`--object-format=sha256` as experimental, 2020-08-16).

I agree with the docs as they stand, and don't think we should hold back
on changing the object format for sha256 in general if there's a
compelling reason to do so.

Whether this suggested change has a compelling reason is another matter
(I haven't reviewed it).

But it seems to me that if the main person pushing the sha256 effort
disagrees with the content of
Documentation/object-format-disclaimer.txt, we'd be better off at this
point discussing a patch to change the wording there to something to the
effect that we consider the format set in stone at this point.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Is the sha256 object format experimental or not?
  2021-05-10 12:22   ` Is the sha256 object format experimental or not? Ævar Arnfjörð Bjarmason
@ 2021-05-10 22:42     ` brian m. carlson
  2021-05-13 20:29       ` dwh
  0 siblings, 1 reply; 21+ messages in thread
From: brian m. carlson @ 2021-05-10 22:42 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 2480 bytes --]

On 2021-05-10 at 12:22:00, Ævar Arnfjörð Bjarmason wrote:
> 
> On Sun, May 09 2021, brian m. carlson wrote:
> > You can't do that.  SHA-256 repositories already exist and that would
> > break compatibility.
> 
> From memory this is at least the second time you've brought up this
> point on-list.
> 
> My feeling is that almost nobody's using sha256 currently, and we have a
> very prominent ALL CAPS warning saying the format is experimental and
> may change, see ff233d8dda1 (Documentation: mark
> `--object-format=sha256` as experimental, 2020-08-16).

Yes, I agreed to such text because others thought it was a good idea in
case we needed to make a change.  However, we don't need to make an
incompatible change here, so we should avoid that if possible.

Almost nobody is using it because the main forges don't yet support it,
because it's going to be just as much work to support it there as it has
been in Git.  We won't be making it easier by making deliberately
incompatible changes when we don't have to.

> I agree with the docs as they stand, and don't think we should hold back
> on changing the object format for sha256 in general if there's a
> compelling reason to do so.

I am using it and I know of other people who are using it.  There are
people whose companies cannot use SHA-1 for compliance reasons and are
already making use of it.

The problem here is a chicken and egg: nobody's going to use SHA-256
support if it's experimental and their entire repo might end up totally
useless, and it's not going to become stable if nobody uses it.

> But it seems to me that if the main person pushing the sha256 effort
> disagrees with the content of
> Documentation/object-format-disclaimer.txt, we'd be better off at this
> point discussing a patch to change the wording there to something to the
> effect that we consider the format set in stone at this point.

I've been pretty clear up front that I thought the data was stable and
we should avoid making incompatible changes.  It may be that it is still
experimental and may change incompatibly, but if we can avoid that
problem, we should.

I don't personally intend to send a patch removing the note about it
being experimental until I've finished getting object interop done,
since that's the major issue where we might need to make an incompatible
change, but that work is moving slowly.
-- 
brian m. carlson (he/him or they/them)
Houston, Texas, US

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Is the sha256 object format experimental or not?
  2021-05-10 22:42     ` brian m. carlson
@ 2021-05-13 20:29       ` dwh
  2021-05-13 20:49         ` Konstantin Ryabitsev
                           ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: dwh @ 2021-05-13 20:29 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Ævar Arnfjörð Bjarmason, git

On 10.05.2021 22:42, brian m. carlson wrote:
>Almost nobody is using it because the main forges don't yet support it,
>because it's going to be just as much work to support it there as it has
>been in Git.  We won't be making it easier by making deliberately
>incompatible changes when we don't have to.

I know that you said there is no reason to make a breaking change to the
SHA256 implementation now, but because of what you say above, I think we
still have the opportunity to make breaking changes. In any case I think
we only need to make one breaking change to gain algorithmic agility
going forward and avoid painful, multi-year transitions like the one
you've been executing.

My project to add universal cryptographic signing to Git by using a
standard protocol and generalized configuration to support any
cryptographic signing scheme could also apply to the digests as well,
and I think it should. If object digests in Git were self-describing
(i.e. they contain an algorithm identifier as well as the digest) then
repos gain "algorithmic agility" and can change algorithms at any time
to keep up as algorithms grow stale and are replaced.

I think Git should externalize the calculation of object digests just
like it externalizes the calcualtion of object digital signatures.
Cryptography is very difficult to get correct and the dedicated tools
for that (e.g. OpenSSH, OpenSSL, GnuPG, etc) get lots of scrutiny and
have the best chance of getting it right. I don't think Git should try
to do cryptography at all.

Object digests should just be names for objects; Git doesn't really need to
know anything more than "is this the name for that object?". Answering
that question can, and should, be done by an external tool that is
implemented correctly and hardened against attack. I think the only
counter-argument for this approach is performace related. Pipe-forking a
child process and reading/writing over IPC pipes is expensive in terms
of context switching and process setup/teardown but there are a number
of mitigations I won't go into here.

I think we should make one last breaking change for digests and not go
with the existing SHA-256 implementation but instead switch to
self-describing digests and digital signatures and rely on external
tools that Git talks to using a standard protocol. We can maintain full
backward compatibility and even support full round tripping using some
of the similar techniques that Brian came up with. A transitional
half-old/half-new signed tag could look like:

```
object 04b871796dc0420f8e7561a895b52484b701d51a
obj 0ED_zgYrQg584bCrqKPoUvxaQ5aMis0GtnW_NrZFTTxUlHLUOyp77LanoZEGV6ajhYGLGTaTfCIQhryovyeNFJuG
type commit
tag signedtag
tagger C O Mitter <committer@example.com> 1465981006 +0000
signtype openpgp
sign LS0tLS1CRUdJTiBQR1AgU0lHTkFUVVJFLS0tLS0KVmVyc2lvbjogR251UEcgdjEKCmlRRWN
 CQUFCQWdBR0JRSlhZUmhPQUFvSkVHRUpMb1czSW5HSmtsa0lBSWNuaEw3UndFYi8rUWVYOWVua1
 hoeG4KcnhmZHFydldkMUs4MHNsMlRPdDhCZy9OWXdyVUJ3L1JXSitzZy9oaEhwNFd0dkUxSERHS
 GxrRXozeTExTGt1aAo4dFN4UzNxS1R4WFVHb3p5UEd1RTkwc0pmRXhoWmxXNGtuSVExd3QveVdx
 TSszM0U5cE40aHpQcUx3eXJkb2RzCnE4RldFcVBQVWJTSlhvTWJSUHcwNFM1anJMdFpTc1VXYlJ
 Zam1KQ0h6bGhTZkZXVzRlRmQzN3VxdUlhTFVCUzAKcmtDM0pyeDc0MjBqa0lwZ0ZjVEkyczYwdW
 hTUUx6Z2NDd2RBMnVrU1lJUm5qZy96RGtqOCszaC9HYVJPSjcyeApsWnlJNkhXaXhLSmtXdzhsR
 TlhQU9EOVRtVFc5c0ZKd2NWQXptQXVGWDJrVXJlRFVLTVpkdUdjb1JZR3BEN0U9Cj1qcFhhCi0t
 LS0tRU5EIFBHUCBTSUdOQVRVUkUtLS0tLQo

signed tag

signed tag message body
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQEcBAABAgAGBQJXYRhOAAoJEGEJLoW3InGJklkIAIcnhL7RwEb/+QeX9enkXhxn
rxfdqrvWd1K80sl2TOt8Bg/NYwrUBw/RWJ+sg/hhHp4WtvE1HDGHlkEz3y11Lkuh
8tSxS3qKTxXUGozyPGuE90sJfExhZlW4knIQ1wt/yWqM+33E9pN4hzPqLwyrdods
q8FWEqPPUbSJXoMbRPw04S5jrLtZSsUWbRYjmJCHzlhSfFWW4eFd37uquIaLUBS0
rkC3Jrx7420jkIpgFcTI2s60uhSQLzgcCwdA2ukSYIRnjg/zDkj8+3h/GaROJ72x
lZyI6HWixKJkWw8lE9aAOD9TmTW9sFJwcVAzmAuFX2kUreDUKMZduGcoRYGpD7E=
=jpXa
-----END PGP SIGNATURE-----
```

I think a good move to make right now would be to add a general function
for stripping out any number of named fields from objects and also
stripping out in-body signatures found in tags. That way we can add
support in today's Git for stripping out fields/data for things like
creating/verifying the object digest and/or digital signature.

BTW, in the example above the 'obj' field is a self-describing, URL-safe
Base64 encoded Blake2b-512 digest encoded using the format described
[here][1]. The starting '0E' Base64 characters identify the digest as
Blake2b-512 and also specify that the length of the digest is 64-bytes.
If you Base64 decode the 'obj' field value you get 66 bytes, the digest
value is the last 64 bytes of the 66 bytes.

By going with self-describing digests we can have configuration files
that contain 'program' and 'options.*' for each external tool that can
create/validate digests of each type. So in this case there would be
something like:

```
[digest "blake2b"]
  program = "blake2bsum"
[digest "blake2b.options"]
  length = 64
```

Using self-describing cryptographic constructs for digests and
signatures and relying on external tools makes it trivial for Git to
walk the object graph and enumerate all of the digest types and
signature types in a given repo and determine if a user has their
configuration set up correctly to work with that repo. Projects can
declare which types they are using and recommend tools to use for those
types.

Cheers!
Dave

TL;DR

Let me try to lay out the case for making a breaking change to sha256
right now that will future-proof repos going forward.

It has been known for a few decades now that cryptography has a
shelf-life. By that I mean as technology and cryptanalisys improves we
have had to make keys larger and invent new algorithms that resist the
new attacks on cryptography. This has been true digest algorithms (i.e.
hashes), digital signatures (i.e. non-repudiation), and encryption (i.e.
confidentiality). The relevant case here is the fact that sha256 is
vulnerable to extension attacks and cryptographers have lost some
confidence in it after many Davies-Meyer (DM) structure and ARX network
designs based on MD4 were broken 20 years ago. SHA-256 uses DM plus a
block cipher based on an ARX network. The end result is that in high
security software, SHA-256 is being replaced with SHA-3 and Blake2
digests.

Another key thing to think about is that a git repo is a form of a
provenance log that could become the primary tool for securing the
software supply chain if we were to make some careful, well thought out
changes arond the digests and digital signatures. What changes exactly?

1. upgrade the digests to something cryptographically secure.
2. digitally sign all commits/merges/tags using...
3. key material tracked with cryptographically secure provenance logs
inside of the repo itself.
4. switch to "late binding", "self describing" cryptographic constructs.

Let me go over these and describe how these fit together.

1. SHA-1 is not cryptographically secure and SHA-256 is already not
   being used in *new* systems and is being replaced in existing, high
   security systems. I think Git should move to more secure digest
   algorithms because the hashes in Git repos are used as naming
   identifiers for Git objects which gives them a higher security
   burden.

2. Digitally signing all commits/merges/tags is critical to tie
   contributions to contributors in a non-repudiable fashion. At the
   very least it is a more secure solution for S-o-b but it also opens
   up the possibility for cryptographically secure accountability. Banks
   and governments are already doing know-your-customer (KYC)
   verifications of identity that can be used to identify contributors
   and their contributions cryptographically. If privacy is a concern,
   zero-knowledge proofs, based on the KYC authentic data, can be used
   to create pseudonymous identities for contributors that can be linked
   to their real-world identity under judicial order. Essentially a
   developer can say, "you don't need to know my real world identity but
   here's proof that XYZ bank knows who I am and here is a large random
   number you can use to de-anonymize me with the help of a court if
   needed"

3. The key material used for identifying contributors needs to move into
   the repos themselves for many reasons but the most important two
   reasons are (1) the repo comes with all of the data necessary to
   verify all of the digital signatures (i.e. solving the PKI problem
   for a project) and (2) to track the provenance of the public keys and
   other related data that each contributor uses. If Git repos contain
   provenance logs that are controlled and maintained by each
   contributor, those logs can also contain digital signatures over the
   code of conduct and the developer certificate of origin and other
   governing documents for a project that are legally binding (i.e.
   follow eIDAS and other legal digital signature rules). Solving the
   PKI problem alone makes digitally signing commits infinitely more
   useful and will drive adoption. Solving the non-repudiable provenance
   problem is the raison d'être of organizations like the Linux
   Foundation. I think Git should align itself with where technology is
   heading on that front.

4. Currently Git uses "early-binding" for all cryptographic material.
   The digest algorithm is hard coded (SHA-1) and the new SHA-256 is as
   well. The digital signature algorithm is also hard coded as either
   GPG or GPGSM. Early-binding makes it very difficult to plan for the
   obsolescense of cryptographic algorithms. The solution is to move to
   "late-binding"/"self-describing" cryptographic constructs. If Git
   were to switch to self-describing digests and digital signatures,
   then Git could be entirely agnostic to cryptography and rely entirely
   upon external crytpographic tools for creating/verifying digests and
   digital signatures. Instead of the direction we're taking on the
   SHA256 changeover, I think Git should switch to self-describing
   digests and digital signatures and use a standard protocol for
   talking to external cryptographic tools instead of trying to get
   cryptography correct in its code.

   Secure Scuttlebutt uses late-binding constructs that contain a type
   "sigil", Base-64 encoded key/digest/blob followed by an algorithm
   decriptor (e.g. ".sha256" or ".ed25519"). Other examples exist such
   as the Multihash encoding scheme for self-describing hashes. All of
   my work on secure provenance logs uses the emerging consensus
   encoding described [here][1]. It uses Base64 encoded cryptographic
   data and it fills what would be the padding bytes with type
   identifiers. I'm not the only one thinking along these lines. The
   [KERI project][2] at the Decentralized Identity Foundation as well as
   [Konstantin][3].


[1]: https://github.com/decentralized-identity/keri/blob/master/kids/kid0001.md
[2]: https://identity.foundation/working-groups/keri.html
[3]: https://people.kernel.org/monsieuricon/patches-carved-into-developer-sigchains

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Is the sha256 object format experimental or not?
  2021-05-13 20:29       ` dwh
@ 2021-05-13 20:49         ` Konstantin Ryabitsev
  2021-05-13 23:47           ` dwh
  2021-05-13 21:03         ` Junio C Hamano
  2021-05-18  5:32         ` Jonathan Nieder
  2 siblings, 1 reply; 21+ messages in thread
From: Konstantin Ryabitsev @ 2021-05-13 20:49 UTC (permalink / raw)
  To: dwh; +Cc: brian m. carlson, Ævar Arnfjörð Bjarmason, git

On Thu, May 13, 2021 at 01:29:19PM -0700, dwh@linuxprogrammer.org wrote:
> 3. The key material used for identifying contributors needs to move into
>   the repos themselves for many reasons but the most important two
>   reasons are (1) the repo comes with all of the data necessary to
>   verify all of the digital signatures (i.e. solving the PKI problem
>   for a project) and (2) to track the provenance of the public keys and
>   other related data that each contributor uses. If Git repos contain
>   provenance logs that are controlled and maintained by each
>   contributor, those logs can also contain digital signatures over the
>   code of conduct and the developer certificate of origin and other
>   governing documents for a project that are legally binding (i.e.
>   follow eIDAS and other legal digital signature rules). Solving the
>   PKI problem alone makes digitally signing commits infinitely more
>   useful and will drive adoption. Solving the non-repudiable provenance
>   problem is the raison d'être of organizations like the Linux
>   Foundation. I think Git should align itself with where technology is
>   heading on that front.

Dave:

Check out what we're doing as part of patatt and b4:
https://pypi.org/project/patatt/

It takes your keyring-in-git idea and runs with it -- it would be good to have
your input while the project is still young and widely unknown. :)

-K

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Is the sha256 object format experimental or not?
  2021-05-13 20:29       ` dwh
  2021-05-13 20:49         ` Konstantin Ryabitsev
@ 2021-05-13 21:03         ` Junio C Hamano
  2021-05-13 23:26           ` dwh
  2021-05-14  8:49           ` Ævar Arnfjörð Bjarmason
  2021-05-18  5:32         ` Jonathan Nieder
  2 siblings, 2 replies; 21+ messages in thread
From: Junio C Hamano @ 2021-05-13 21:03 UTC (permalink / raw)
  To: dwh; +Cc: brian m. carlson, Ævar Arnfjörð Bjarmason, git

dwh@linuxprogrammer.org writes:

> I think Git should externalize the calculation of object digests just
> like it externalizes the calcualtion of object digital signatures.

The hashing algorithms used to generate object names has
requirements fundamentally different from that of digital
signatures.  I strongly suspect that that fact would change the
equation when you rethink what you said above.

We can "upgrade" digital signature algorithms fairly easily---nobody
would complain if you suddenly choose different signing algorithm
over a blob of data, as long as all project participants are aware
(and self-describing datastream helps here) and are capable of
grokking the new algorithm we are adopting.  But because object
names are used by one object to refer to another, and most
importantly, we do not want a single object to have multiple names,
we cannot afford to introduce a new hashing algorithm every time we
feel like it.  In other words, diversity of object naming algorithms
is to be avoided as much as possible, while diversity of signature
algorithms is naturally expected.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Is the sha256 object format experimental or not?
  2021-05-13 21:03         ` Junio C Hamano
@ 2021-05-13 23:26           ` dwh
  2021-05-14  8:49           ` Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 21+ messages in thread
From: dwh @ 2021-05-13 23:26 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: brian m. carlson, Ævar Arnfjörð Bjarmason, git

On 14.05.2021 06:03, Junio C Hamano wrote:
>dwh@linuxprogrammer.org writes:
>
>> I think Git should externalize the calculation of object digests just
>> like it externalizes the calcualtion of object digital signatures.
>
>The hashing algorithms used to generate object names has
>requirements fundamentally different from that of digital
>signatures.  I strongly suspect that that fact would change the
>equation when you rethink what you said above.

I agree with you. Object names are exactly that: names. Names for
resources/data must be persistent, as well as global in scope and
uniqueness, and autonomously assigned. What this means is that once an
object has a name, that name shall never change as long as the object
remains unchanged. The names must be unique in the scope of all objects
(e.g. all copies of a repo) and generated without coordination.

Calculating object names using a digest algorithm meets all of these
requirements. Choosing a strong digest algorithm creates a strong
cryptographic binding between the name and the object contents. Using
self-describing digests allows for a repo to switch digest algorithms at
arbitrary points in the history.

I think that objects named with SHA1 digests should remain named with
the SHA1 digest. I do *not* advocate going back and rewriting history
to change all of the object names to a digest with a different
algorithm. Git is a provenance log and history matters. I recommend
preserving all existing names, even if they were created with known-weak
digest algorithms, and making the change to a new algorithm at a
specific point in time (e.g. at a tag). Using self-describing digest
encoding and externalizing digest calculation future-proofs
repositories and allows for preservation of history while allowing
algorithm agility.

To illustrate my point, I envision that a repos could have a history
like this:

object 2923f6fa36614586ea09b4424b438915cc1b9b67 (naked SHA1)
  |
<many objects named with SHA1>
  |
object 5f167fb6b3e96273b564fff0b041fb94fee4d3de (naked SHA1)
  |
<modify Git to ext. digest calculation and self-desc encoding>
  |
object 98c2e1c0965e60b0f137577ac5dd0a5c96ce224d (naked SHA1)
  |
<many objects named with SHA1>
  |
<a project decides to switch to SHA2-256, maybe marked in a tag>
  |
object IAOdLVxteOxQwKa-xn8yCBUkuPkjAqcuQ2V7fKAlao8o (self-desc.SHA2-256)
  |
<many objects named with self-describing SHA2-256 digests>
  |
<a project decices to switch to SHA3-256, maybe marked in a tag>
  |
object EK832G0PFhBFf-Dfgr205UKpUMqmVXJX9ltLwQo4Awct (self-desc.SHA3-256)
  |
<many objects named with self-descring SHA3-256 digests>
  .
  .
  .

Neither decision to switch to SHA2-256 nor to SHA3-256 would require any
code changes. If we continue down the current SHA-256 road, we will have
to repeat that multi-year effort in the future to switch to SHA3 or
something else. Most importantly, the choice of digest algorithm would
be left up to the maintainers of a given repo and not limited to the
algorithms we have hard coded into Git.

Brian's work on the SHA-256 switch is valuable. We can leverage a lot of
it to switch to externalized digest calculation and self-describing
digests and never have to worry about doing that again.

Cheers!
Dave

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Is the sha256 object format experimental or not?
  2021-05-13 20:49         ` Konstantin Ryabitsev
@ 2021-05-13 23:47           ` dwh
  2021-05-14 13:45             ` Konstantin Ryabitsev
  0 siblings, 1 reply; 21+ messages in thread
From: dwh @ 2021-05-13 23:47 UTC (permalink / raw)
  To: Konstantin Ryabitsev
  Cc: brian m. carlson, Ævar Arnfjörð Bjarmason, git

On 13.05.2021 16:49, Konstantin Ryabitsev wrote:
>Check out what we're doing as part of patatt and b4:
>https://pypi.org/project/patatt/
>
>It takes your keyring-in-git idea and runs with it -- it would be good to have
>your input while the project is still young and widely unknown. :)

Konstantin:

That's really clever. I especially love how you're using the list
archive as the provenance log of old keys developers used. That seems
like it would work although I have worries about the security of
X-Developer-Key and the lack of key history immediately available to
`git log` because it's in the list archive and not in the repo directly.
I guess the old keys would still be in your local keyring for `gpg` to
use but it would mark signatures created with old revoked keys as
invalid even though they are valid.

Old keys--even if revoked or compromised--matter in a world of digitally
signed data. As a matter of course, people should rotate their signing
keys on a regular basis. It's just good hygiene. That means that there
will always be old data signed with old keys and those old keys need to
be kept around to validate the old signatures.

My approach has been to move to cryptographically secure provenance logs
that contain key rotation events and commitments to future keys and also
cryptographically linking to arbitrary metadata (e.g. KYC proofs, etc).
The file format is documented using the Community Standard template from
the LF. I'm hoping to move Git to use external tools for all digest and
digital signature operations. Then I can start putting provenance logs
into a ".well-known" path in Git repos, maybe ".plogs" or something.
Then I can write/adapt a signing tool to understand provenance logs
of public keys in the repo instead of the GPG keyring stuff we have
today.

Provenance logs accumulate the full key history of a developer over
time. It represents a second axis of time such that the HEAD of a repo
will have the full key history, for every contributor available to
cryptographic tools for verifying signatures. This makes `git log
--show-signature` operations maximally efficient because we don't have
to check out old keyrings from history to recreate the state GPG was in
when the signature was created.

I still like your approach purely for the "it works right now" aspect of
the solution. Good job. I can't wait to see it in action.

Cheers!
Dave

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Is the sha256 object format experimental or not?
  2021-05-13 21:03         ` Junio C Hamano
  2021-05-13 23:26           ` dwh
@ 2021-05-14  8:49           ` Ævar Arnfjörð Bjarmason
  2021-05-14 18:10             ` dwh
  1 sibling, 1 reply; 21+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-05-14  8:49 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: dwh, brian m. carlson, git


On Fri, May 14 2021, Junio C Hamano wrote:

> dwh@linuxprogrammer.org writes:
>
>> I think Git should externalize the calculation of object digests just
>> like it externalizes the calcualtion of object digital signatures.
>
> The hashing algorithms used to generate object names has
> requirements fundamentally different from that of digital
> signatures.  I strongly suspect that that fact would change the
> equation when you rethink what you said above.
>
> We can "upgrade" digital signature algorithms fairly easily---nobody
> would complain if you suddenly choose different signing algorithm
> over a blob of data, as long as all project participants are aware
> (and self-describing datastream helps here) and are capable of
> grokking the new algorithm we are adopting.  But because object
> names are used by one object to refer to another, and most
> importantly, we do not want a single object to have multiple names,
> we cannot afford to introduce a new hashing algorithm every time we
> feel like it.  In other words, diversity of object naming algorithms
> is to be avoided as much as possible, while diversity of signature
> algorithms is naturally expected.

I agree insofar that I don't see a good reason for us to support some
plethora of hash algorithms, but I wouldn't have objections to adding
more if people find them useful for some reason. See e.g. [1] for an
implementation.

But I really don't see how anything you've said would present a
technical hurdle once we have SHA-1<->SHA-256 interop in a good enough
state. At that point we'll support re-hashing on arrival of content
hashed with algorithm X into Y, with a local lookup table between X<=>Y.

So if somebody wants to maintain content hashed with algorithm Z locally
we should easily be able to support that. The "diversity of naming"
won't matter past that local repository, any mention of Z will be
translated to X or Y on fetch/push.

1. https://lore.kernel.org/git/20191222064809.35667-1-michaeljclark@mac.com/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Is the sha256 object format experimental or not?
  2021-05-13 23:47           ` dwh
@ 2021-05-14 13:45             ` Konstantin Ryabitsev
  2021-05-14 17:39               ` dwh
  0 siblings, 1 reply; 21+ messages in thread
From: Konstantin Ryabitsev @ 2021-05-14 13:45 UTC (permalink / raw)
  To: dwh; +Cc: brian m. carlson, Ævar Arnfjörð Bjarmason, git

On Thu, May 13, 2021 at 04:47:06PM -0700, dwh@linuxprogrammer.org wrote:
> On 13.05.2021 16:49, Konstantin Ryabitsev wrote:
> > Check out what we're doing as part of patatt and b4:
> > https://pypi.org/project/patatt/
> > 
> > It takes your keyring-in-git idea and runs with it -- it would be good to have
> > your input while the project is still young and widely unknown. :)
> 
> Konstantin:
> 
> That's really clever. I especially love how you're using the list
> archive as the provenance log of old keys developers used. That seems
> like it would work although I have worries about the security of
> X-Developer-Key and the lack of key history immediately available to
> `git log` because it's in the list archive and not in the repo directly.
>
> I guess the old keys would still be in your local keyring for `gpg` to
> use but it would mark signatures created with old revoked keys as
> invalid even though they are valid.

Thanks for taking a look at it. I don't view this as much of a problem, since
the goal for the project is specifically end-to-end patch attestation. For git
commits, if they are signed with a key from the in-git keyring, it would
actually be really straightforward to get the valid key at the time of signing
-- you just retrieve the keyring using the date of the commit.

> My approach has been to move to cryptographically secure provenance logs
> that contain key rotation events and commitments to future keys and also
> cryptographically linking to arbitrary metadata (e.g. KYC proofs, etc).
> The file format is documented using the Community Standard template from
> the LF. I'm hoping to move Git to use external tools for all digest and
> digital signature operations. Then I can start putting provenance logs
> into a ".well-known" path in Git repos, maybe ".plogs" or something.
> Then I can write/adapt a signing tool to understand provenance logs
> of public keys in the repo instead of the GPG keyring stuff we have
> today.
> 
> Provenance logs accumulate the full key history of a developer over
> time. It represents a second axis of time such that the HEAD of a repo
> will have the full key history, for every contributor available to
> cryptographic tools for verifying signatures. This makes `git log
> --show-signature` operations maximally efficient because we don't have
> to check out old keyrings from history to recreate the state GPG was in
> when the signature was created.

Hmm... I'm not sure if it's an inefficient operation in the first place. If
the keyring is in the same branch as the commit itself, then you can retrieve
the public key using "git show [commit-sha]:path/to/that/pubkey". If it's in a
different branch, then it's slightly more complicated because then you have to
find a keyring commit corresponding to the commit-date of the object you're
checking. In any case, these are all pretty fast operations in git.

> I still like your approach purely for the "it works right now" aspect of
> the solution. Good job. I can't wait to see it in action.

As you know, this is my third attempt at getting patch attestation off the
ground. The first one I implemented using detached attestation documents and
it was clever and neat, but it was too complicated and failed to take off -- I
think mostly because a) it wasn't easy to understand what it's doing, and b)
it required that people adjust their workflows too much.

The second attempt was better, but I think it was still too complicated,
because it required that we parse patch content, making it fragile and slow on
very large patch sets.

I'm hoping that this version resolves the downsides of the previous two
attempts by both being dumb and simple and by only requiring a simple one-time
setup (via the sendemail-validate hook) with no further changes to the usual
git-send-email workflow after that.

I've not yet widely promoted this, as patatt is a very new project, but I'm
hoping to start reaching out to people to trial it out in the next few weeks.

Thanks,
-K

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Is the sha256 object format experimental or not?
  2021-05-14 13:45             ` Konstantin Ryabitsev
@ 2021-05-14 17:39               ` dwh
  0 siblings, 0 replies; 21+ messages in thread
From: dwh @ 2021-05-14 17:39 UTC (permalink / raw)
  To: Konstantin Ryabitsev
  Cc: brian m. carlson, Ævar Arnfjörð Bjarmason, git

On 14.05.2021 09:45, Konstantin Ryabitsev wrote:
>As you know, this is my third attempt at getting patch attestation off the
>ground. 

Yes, I've been following. It's been a long road.

>I'm hoping that this version resolves the downsides of the previous two
>attempts by both being dumb and simple and by only requiring a simple one-time
>setup (via the sendemail-validate hook) with no further changes to the usual
>git-send-email workflow after that.

I'm very interested in whether this one works. You and I are completely
aligned on this. I don't think I'm paying enough attention to the
emailed patch attestations as you have. I think I understand the
requirements but maybe not all of them. Do you have any threads on
public-inbox where you discuss them? I want to make sure that what I'm
doing doesn't undermine anything you're trying to do. The end goal is to
have an air-tight provenance on all contributions and
accountable/audtiable software supply chain. We're all working towards
that.

>I've not yet widely promoted this, as patatt is a very new project, but I'm
>hoping to start reaching out to people to trial it out in the next few weeks.

Hopefully this approach strikes the right balance.

Cheers!
Dave

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Is the sha256 object format experimental or not?
  2021-05-14  8:49           ` Ævar Arnfjörð Bjarmason
@ 2021-05-14 18:10             ` dwh
  0 siblings, 0 replies; 21+ messages in thread
From: dwh @ 2021-05-14 18:10 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Junio C Hamano, brian m. carlson, git

On 14.05.2021 10:49, Ævar Arnfjörð Bjarmason wrote:
>I agree insofar that I don't see a good reason for us to support some
>plethora of hash algorithms, but I wouldn't have objections to adding
>more if people find them useful for some reason. See e.g. [1] for an
>implementation.

I think Git should not try to do any cryptographic operations at all and
rely on external tools that are implemented properly and hardended.
Implementing cryptography isn't just about translating the algorithm
into code but also getting memory security correct, file handling
correct, input security correct, control flow correct (equal cost
multi-path), etc, etc. Most of the cryptography libraries aren't
designed to be misuse resistant. The only one I know of that has that as
a top-line requirement is Hyperledger Ursa [1].

I would like to see us remove all cryptography code (e.g. digests,
digital signatures, etc) from Git and rely on external tools entirely.
If we store the cryptographic material in a self-describing format that
identifies the associated tool as well as the cryptographic data, then
Git can be completely agnostic.

>But I really don't see how anything you've said would present a
>technical hurdle once we have SHA-1<->SHA-256 interop in a good enough
>state. At that point we'll support re-hashing on arrival of content
>hashed with algorithm X into Y, with a local lookup table between X<=>Y.
>
>So if somebody wants to maintain content hashed with algorithm Z locally
>we should easily be able to support that. The "diversity of naming"
>won't matter past that local repository, any mention of Z will be
>translated to X or Y on fetch/push.

Using self-describing formats allows us to honor history and keep old
object names as they and eliminate all of this added complications you
describe. I think there is a lot of room for errors to creep in when
collaborators have copies of the same repo and they have local mappings
between different hashing algorithms. How is this not setting up for a
combinatorial explosion of data? If the canonical repo uses SHA1 and one
contributor uses SHA2-512, another uses Blake2b-256, and yet another
uses SHA3-384, won't they all have to maintain six different translation
tables for all objects? SHA1 <=> SHA2-512, SHA1 <=> Blake2b-256, SHA1
<=> SHA3-384, SHA2-512 <=> Blake2b-256, SHA2-512 <=> SHA3-384, and
Blake2b-256 <=> SHA3-384? I guess that's your motivation for not
allowing algorithmic agility.

The way around this is to use self-describing formats and external
tools. Git repo copies wouldn't be required to have only *one* algorithm
naming all objects, requiring the translation tables. Instead Git repos
would/could have heterogeneous object names, each one with a single name
generated with a different digest algorithm. Git would simply consider
those names as plain strings and validating those strings requires
talking to the correct external tool, sending the name string and the
object data and reading back the result.

I think this is a much better approach because:

1. It creates algorithmic agility in a way that isn't top-down and heavy
handed.

2. It eliminates the need for all of the translation tables and
round-tripping complexity.

3. It empowers maintainers to decide which algorithms can/must be used
when naming objcts in a given repo. Merge hooks, CI/CD checks and
etiquette guides can be used to enforce this.

4. Git's attack surface becomes smaller (a very good thing) and limited
to doing IPC to external tools correctly and securely (easy) instead of
trying to get cryptography client code correct (very difficult).

One other thing to consider is that there are new tools being developed
that do similar things as Git that do have algorithmic agility and use
self-describing cryptographic primitives. Late-binding trust is now a
best practice and has been for quite some time. Many people rely upon
Git and I think we should keep up with the best practices.

Cheers!
Dave

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Is the sha256 object format experimental or not?
  2021-05-13 20:29       ` dwh
  2021-05-13 20:49         ` Konstantin Ryabitsev
  2021-05-13 21:03         ` Junio C Hamano
@ 2021-05-18  5:32         ` Jonathan Nieder
  2 siblings, 0 replies; 21+ messages in thread
From: Jonathan Nieder @ 2021-05-18  5:32 UTC (permalink / raw)
  To: dwh; +Cc: brian m. carlson, Ævar Arnfjörð Bjarmason, git

Hi,

dwh@linuxprogrammer.org wrote:

> I think we should make one last breaking change for digests and not go
> with the existing SHA-256 implementation but instead switch to
> self-describing digests and digital signatures and rely on external
> tools that Git talks to using a standard protocol. We can maintain full
> backward compatibility and even support full round tripping using some
> of the similar techniques that Brian came up with.

Forgive my ignorance: can you describe what compatibility break you
mean?  Do you mean _removing_ support for gpgsig-sha256?  If so,
why --- couldn't you get the same benefit by introducing the new
functionality you're describing without getting rid of historical
functionality at the same time?

A nice thing about signatures is that they don't change the semantics
of the object.  So some future version of Git can remove support for
verifying them, if they turn out

By the way, to be clear, the hash-function-transition doc in
Documentation/technical/ is not by Brian alone.  It is the result of
collaboration by various people on list (see its git history for
details).

[...]
> object 04b871796dc0420f8e7561a895b52484b701d51a
> obj 0ED_zgYrQg584bCrqKPoUvxaQ5aMis0GtnW_NrZFTTxUlHLUOyp77LanoZEGV6ajhYGLGTaTfCIQhryovyeNFJuG
> type commit
> tag signedtag
> tagger C O Mitter <committer@example.com> 1465981006 +0000
> signtype openpgp
> sign LS0tLS1CRUdJTiBQR1AgU0lHTkFUVVJFLS0tLS0KVmVyc2lvbjogR251UEcgdjEKCmlRRWN
> CQUFCQWdBR0JRSlhZUmhPQUFvSkVHRUpMb1czSW5HSmtsa0lBSWNuaEw3UndFYi8rUWVYOWVua1
> hoeG4KcnhmZHFydldkMUs4MHNsMlRPdDhCZy9OWXdyVUJ3L1JXSitzZy9oaEhwNFd0dkUxSERHS
> GxrRXozeTExTGt1aAo4dFN4UzNxS1R4WFVHb3p5UEd1RTkwc0pmRXhoWmxXNGtuSVExd3QveVdx
> TSszM0U5cE40aHpQcUx3eXJkb2RzCnE4RldFcVBQVWJTSlhvTWJSUHcwNFM1anJMdFpTc1VXYlJ
> Zam1KQ0h6bGhTZkZXVzRlRmQzN3VxdUlhTFVCUzAKcmtDM0pyeDc0MjBqa0lwZ0ZjVEkyczYwdW
> hTUUx6Z2NDd2RBMnVrU1lJUm5qZy96RGtqOCszaC9HYVJPSjcyeApsWnlJNkhXaXhLSmtXdzhsR
> TlhQU9EOVRtVFc5c0ZKd2NWQXptQXVGWDJrVXJlRFVLTVpkdUdjb1JZR3BEN0U9Cj1qcFhhCi0t
> LS0tRU5EIFBHUCBTSUdOQVRVUkUtLS0tLQo
[...]
> I think a good move to make right now would be to add a general function
> for stripping out any number of named fields from objects and also
> stripping out in-body signatures found in tags. That way we can add
> support in today's Git for stripping out fields/data for things like
> creating/verifying the object digest and/or digital signature.

Can you say a little more about the user-facing model here?  How does
a user know whether the signature verification result they're looking
at describes the part of the object they care about or has stripped it
out?

[...]
> Let me try to lay out the case for making a breaking change to sha256
> right now that will future-proof repos going forward.
>
> It has been known for a few decades now that cryptography has a
> shelf-life.

Yes, this is a key assumption of the hash function transition.  It is
meant to be repeatable, so that we are not stuck on a particular
cryptographic hash.

[...]
>                                       The end result is that in high
> security software, SHA-256 is being replaced with SHA-3 and Blake2
> digests.

Do you mean that practice is drifting away from the conclusion of
https://www.imperialviolet.org/2017/05/31/skipsha3.html?  Where can I
read more?

It took a while to decide on sha256 as the hash for Git to use to
replace sha1.  The process involved useful feedback from Keccak team
and others, and I feel pretty comfortable with how thoroughly it was
discussed, though of course I wouldn't be surprised if the state of
cryptanalysis has changed in some way since then.

The front runners were from the SHA2, SHA3, and Blake2 families.  The
main factor that led to deciding on SHA2 is the wide availability of
efficient and trustworthy implementations, in hardware and software.
See https://lore.kernel.org/git/alpine.DEB.2.21.1.1706151122180.4200@virtualbox/#t
and https://lore.kernel.org/git/20180609224913.GC38834@genre.crustytoothpaste.net/#t
for some of the discussion that led there.

[...]
> 4. switch to "late binding", "self describing" cryptographic constructs.

As Junio mentioned, Git does not impose a requirement on the signature
algorithm used in a signature block, including the digest involved.
However, signing history typically involves signing object names, and
object names use a cryptographic hash for other reasons.  If we want
Git to stop using a content addressable object store, that would be a
more fundamental changes to its design.

Thanks and hope that helps,
Jonathan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Preserving the ability to have both SHA1 and SHA256 signatures
  2021-05-16 20:57 Preserving the ability to have both SHA1 and SHA256 signatures Personal Sam Smith
@ 2021-05-17  3:23 ` Felipe Contreras
  0 siblings, 0 replies; 21+ messages in thread
From: Felipe Contreras @ 2021-05-17  3:23 UTC (permalink / raw)
  To: Personal Sam Smith, dwh; +Cc: git

Personal Sam Smith wrote:
> One of the essential properties of any good cryptographic system is
> what is called cryptographic algorithm agility. Without it the system
> cannot easily adapt to new attacks and newly discovered weaknesses in
> cryptographic algorithms. Self-describing cryptographic primitives are
> the most convenient enabler for cryptographic agility. One advantage
> of signed hash chained provenance logs is that the whole log must be
> compromised not merely one part of it. Such a log that exhibits
> agility especially through self-describing primitives is self-healing
> in sense that new appendages to the log may use stronger crypto
> primitives which protect earlier entries in the log that use weaker
> primitives. This makes the log (or any such agile self-describing
> verifiable data structure) future proof. It is the best practice for
> designing distributed (over the internet) zero trust computing
> applications. 

This is way above my pay grade, but let me try to interpret the above.

If we have a repository with two digest algorithms:

 2. BLAKE2b (considered non-compromised)
 1. SHA-1 (broken)

We may not be confident on the SHA-1 history (1), but as long as we have
BLAKE2b history (2), we can be confident on that.

The delta between when SHA-1 was broken, and the switch to BLAKE2b
happened, is when the repository could be potentially compromised.

So, it's in the best interest of the repository owners to switch to the
non-compromised version as soon as possible. In fact, it would be better
if the switch happened *BEFORE* SHA-1 was broken.

This is why algorithm agility is important.


But this is not sufficient, because BLAKE2b could get
compromised in the future. The repository owners need to be thinking
ahead to the time, to when they'll need to make yet another algorithm
switch.

When such times comes, they need their infrastructure to be able to
perform the switch as fast as possible. If possible right after they've
finalized their decision.


So, if I can summarize your and dwh's proposal: git should be
cryptographic-digest-algorithm-agnostic.


So far this makes sense to me.

The only problem comes when you consider day-to-day operations, which to
be honest have been totally uninterrupted by 15 years of using SHA-1.

At this point it's worth noting that if the git project has a maxim, it
would be a single word: "performance". Nothing else matters.

So, if you suggest to switch from SHA-1 to SHA-256, that's fine; as long
as you can guarantee that *performance* is not affected. This is the
work brian m. carlson seems to have been doing.

On the other hand what dwh seemed to suggest is to support every digest
algorithm on the horizon--without regards of how that would affect
performance--and as expected that didn't land very smoothly.


But I don't think the two approaches are incompatible.

All we have to do is reconcile two facts:

  1. The ability for users to switch to a new digest is important
  2. We don't want users to be switching algorithms every other commit

If git can switch the digest algorithm on a per-repository basis, I
don't think anybody would have a problem with that.

Git could support SHA-1, SHA-256, and BLAKE2b as of today. The
repository owners can decide wich algorithm to choose today, and their
past history would not be affected.

This is future-proof, and would make repository owners be able to make
that decision, not git.

If at some point in the future people want to start to get ready for
SHA-4, that could be introduced to the git core, *before* people want to
make such transition, and *after* the project has made sure such change
does not impact on performance.

Or am I missing something?

Cheers.

-- 
Felipe Contreras

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Preserving the ability to have both SHA1 and SHA256 signatures
@ 2021-05-16 20:57 Personal Sam Smith
  2021-05-17  3:23 ` Felipe Contreras
  0 siblings, 1 reply; 21+ messages in thread
From: Personal Sam Smith @ 2021-05-16 20:57 UTC (permalink / raw)
  To: dwh; +Cc: git

dwh invited me to contribute to this discussion and I hope my comments are helpful. He referenced my work on the DIF KERI WG standard. This emerging standard has been adopted by the Global Legal Entity Identifier Foundation (GLEIF) as the basis for its new verifiable LEIs. These are required by many regulator bodies for participating legal entities.
https://keri.one  
https://identity.foundation/working-groups/keri.html 
https://www.gleif.org/en/lei-solutions/gleifs-digital-strategy-for-the-lei/introducing-the-verifiable-lei-vlei

This is part of a much larger effort to fix the security of internet distributed systems in general. The approach is based on the principles of what I like to call zero-trust-computing (ZTC) which is a generalization of the more commonly know zero-trust-networking (ZTN). Zero trust mean never trust always verify where verify is in the cryptographic sense of verifying cryptographic operations such as signatures or digests. ZTN is becoming increasingly popular for access control of networked applications. In contrast, ZTC merges ZTN principles with trusted computing principles to the architecture of any distributed software application.  
https://trustedcomputinggroup.org
https://github.com/WebOfTrustInfo/rwot7-toronto/blob/master/final-documents/A_DID_for_everything.pdf
https://github.com/WebOfTrustInfo/rwot10-buenosaires/blob/master/final-documents/quantum-secure-dids.pdf

The core idea of zero-trust is end-to-end verifiability of all operations in the system. The type of operation is application dependent. The verifiability is cryptographic. One of the most important (and most relevant to git) types of end-to-end verifiability is authenticity via non-repudiable signatures. A signature is also a hash (digest) so it secures both the integrity of and attribution to the source of that data. 

In trusted computing one starts with secure roots-of-trust that one may then build the rest of the system upon. In distributed trusted computing the root-of-trust is a verifiable data structure https://www.continusec.com/static/VerifiableDataStructures.pdf  https://transparency.dev/verifiable-data-structures/ https://www.bbva.com/en/on-building-a-verifiable-log-part-1-core-ideas/

The point is that a verifiable data structure provides an end-verifiable proof of some state. It becomes a verifiable state machine which means any software application may be made verifiable using verifiable data structures. The verifiable data structure provides a secure root-of-trust that satisfies the end-verifiability principle of zero-trust computing needed for distributed systems. A open end-verifiable system may exhibit ambient verifiability, that is, any copy is verifiable by anyone anywhere at anytime.

One of the simplest forms of a verifiable data structure is a hash chained signed append only log such as a provenance log (proposed above @dwh). A variant would be a hash chained signed DAG. The degree of security or cryptographic strength of the log is a function of the cryptographic strength of both the digest and signature operations. Unlike what is popularly portrayed in movies, a crypto system with at least 128 bits of cryptographic strength is practically infeasible to attack by brute force, i.e. are impervious to brute force attack. Instead the attack must be some sort of what is called a side-channel attack usually against one of three targets, key creation and storage infrastructure, data signing infrastructure or signature verification infrastructure.  https://github.com/SmithSamuelM/Papers/blob/master/whitepapers/IdentifierTheory_web.pdf

For the first two (key creation/storage and data signing) there are  many well known techniques such as secure enclaves, TPMs, HSMs, and TEEs as well as using threshold structures like multi-sig that may provide arbitrarily high levels of security. The third side channel attack targets signature verification usually is dependent on using secure code libraries. But the last two, namely, data signing and signature verification infrastructure, require secure code delivery of the  code as integrated into the application that consumes it. The result is that when designing zero-trust computing systems based on verifiable data structures, the weakest link is a side channel attack, the weakest link for side channel attacks is often the secure code delivery mechanism, and the weakest link for secure code delivery is often git.

What dwh is proposing is converting git from a software application with what the security community would consider antiquated security to a best-of-breed security system based on zero-trust-computing principles. This conversion does not come from imbuing git with its own security system for end-verifiable authenticity but instead layering git on top of a secure end-verifiable authenticity layer outside of git. This layering is enabled by using self-describing cryptographic primitives inside a self-describing verifiable data structure. Self-describing verifiable data structures are to the security world what JSON is to the API world. By using self-describing primates (such as a self-describing hash) in git's data structure, then those become end-verifiable data structures themselves. A signature on a secure digest is a convenient way of making secure attribution to the associated data without signing the data itself. But this requires that the digest be at least as secure as the signature. A secure digest also has the property of post-quantum protection. So a secure digest such as Blake2b, Sha3, and Blake3 digests can be used to protect non-post-quantum proof signature schemes from surprise quantum attack. 

One of the essential properties of any good cryptographic system is what is called cryptographic algorithm agility. Without it the system cannot easily adapt to new attacks and newly discovered weaknesses in cryptographic algorithms. Self-describing cryptographic primitives are the most convenient enabler for cryptographic agility. One advantage of signed hash chained provenance logs is that the whole log must be compromised not merely one part of it. Such a log that exhibits agility especially through self-describing primitives is self-healing in sense that new appendages to the log may use stronger crypto primitives which protect earlier entries in the log that use weaker primitives. This makes the log (or any such agile self-describing verifiable data structure) future proof. It is the best practice for designing distributed (over the internet) zero trust computing applications. 

It is my prediction that over the next few years there will be a rapid switchover to the use of zero-trust computing architectures based on self-describing verifiable data structures for distributed internet applications. It is the most elegant, most decentralized, solution to the security problems of distributed internet applications. Because of git's important role in code creation and delivery, it should IMHO be leading out in this space and dwh's proposal does just that.  Not fixing git in this way will eventually force work arounds for anyone seriously implementing zero-trust architectures. This will result in non-standard usually proprietary implementations of access control mechanisms in an attempt to fix up the relatively antiquated security of git tooling. This will be bad for everyone as it will balkanize git tooling along proprietary access control mechanisms, (which is already happening). A open interoperable zero-trust future proofed secure git requires that git be secured by a verifiable substrate such as dwh is proposing. Not some antiquated mechanism as is the case today. 

















^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2021-05-18  5:33 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-08  2:22 Preserving the ability to have both SHA1 and SHA256 signatures dwh
2021-05-08  6:39 ` Christian Couder
2021-05-08  6:56   ` Junio C Hamano
2021-05-08  8:03     ` Felipe Contreras
2021-05-08 10:11       ` Stefan Moch
2021-05-08 11:12         ` Junio C Hamano
2021-05-09  0:19 ` brian m. carlson
2021-05-10 12:22   ` Is the sha256 object format experimental or not? Ævar Arnfjörð Bjarmason
2021-05-10 22:42     ` brian m. carlson
2021-05-13 20:29       ` dwh
2021-05-13 20:49         ` Konstantin Ryabitsev
2021-05-13 23:47           ` dwh
2021-05-14 13:45             ` Konstantin Ryabitsev
2021-05-14 17:39               ` dwh
2021-05-13 21:03         ` Junio C Hamano
2021-05-13 23:26           ` dwh
2021-05-14  8:49           ` Ævar Arnfjörð Bjarmason
2021-05-14 18:10             ` dwh
2021-05-18  5:32         ` Jonathan Nieder
2021-05-16 20:57 Preserving the ability to have both SHA1 and SHA256 signatures Personal Sam Smith
2021-05-17  3:23 ` Felipe Contreras

git@vger.kernel.org list mirror (unofficial, one of many)

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V1 git git/ https://public-inbox.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors.
Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://7fh6tueqddpjyxjmgtdiueylzoqt6pt7hec3pukyptlmohoowvhde4yd.onion/inbox.comp.version-control.git
	nntp://ie5yzdi7fg72h7s4sdcztq5evakq23rdt33mfyfcddc5u3ndnw24ogqd.onion/inbox.comp.version-control.git
	nntp://4uok3hntl7oi7b4uf4rtfwefqeexfzil2w6kgk2jn5z2f764irre7byd.onion/inbox.comp.version-control.git
	nntp://news.gmane.io/gmane.comp.version-control.git
 note: .onion URLs require Tor: https://www.torproject.org/

code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git