git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Commit SHA1 == SHA1 checksum?
@ 2022-02-05  1:19 Gamblin, Todd
  2022-02-06  0:22 ` Philip Oakley
  0 siblings, 1 reply; 19+ messages in thread
From: Gamblin, Todd @ 2022-02-05  1:19 UTC (permalink / raw)
  To: git@vger.kernel.org

Apologies if this has been asked before, but the closest thing I could find was this thread:

	http://public-inbox.org/git/Pine.LNX.4.62.0504160519330.21837@qynat.qvtvafvgr.pbz/

That thread devolved into a discussion of the security of different hashes and didn’t answer my question.

I want to know when and where git *guarantees* that the snapshot I have checked out has the checksum that git says it does, or if it does at all.

The use case for this is for package managers. I work on Spack (http://github.com/spack/spack if you’re curious) and we download sources from tarballs and git repos (like many similar tools).  For tarballs we require a sha256, and we use it to verify the tarball after download.

For git repos, we would like to require a commit sha1, provided that it’s basically as secure as downloading a tarball and checking it against a known sha1.  So, if I `git clone` something, is the commit sha1 actually verified?

Thanks,
-Todd


PS: I know that sha1 has been declared “risky” by NIST and that folks should move away from it, and please be assured that we’re using sha256’s everywhere else.  Here I really just want to know whether cloning a git repo at a particular commit is as secure as downloading a tarball and checking it against a sha1, not whether or not sha1 is secure.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Commit SHA1 == SHA1 checksum?
  2022-02-05  1:19 Commit SHA1 == SHA1 checksum? Gamblin, Todd
@ 2022-02-06  0:22 ` Philip Oakley
  2022-02-06  9:00   ` Gamblin, Todd
  2022-02-06 10:15   ` Junio C Hamano
  0 siblings, 2 replies; 19+ messages in thread
From: Philip Oakley @ 2022-02-06  0:22 UTC (permalink / raw)
  To: Gamblin, Todd, git@vger.kernel.org

On 05/02/2022 01:19, Gamblin, Todd wrote:
> Apologies if this has been asked before, but the closest thing I could find was this thread:
>
> 	http://public-inbox.org/git/Pine.LNX.4.62.0504160519330.21837@qynat.qvtvafvgr.pbz/
>
> That thread devolved into a discussion of the security of different hashes and didn’t answer my question.
>
> I want to know when and where git *guarantees* that the snapshot I have checked out has the checksum that git says it does, or if it does at all.
>
> The use case for this is for package managers. I work on Spack (http://github.com/spack/spack if you’re curious) and we download sources from tarballs and git repos (like many similar tools).  For tarballs we require a sha256, and we use it to verify the tarball after download.
>
> For git repos, we would like to require a commit sha1, provided that it’s basically as secure as downloading a tarball and checking it against a known sha1.  So, if I `git clone` something, is the commit sha1 actually verified?
For the Git releases, the maintainer, Junio, will PGP sign the release
tag with his key e.g.
https://git.kernel.org/pub/scm/git/git.git/tag/?h=v2.35.1

The tag contains the sha1 hash of the release commit, which in turn
contains the sha1 hashes of the tree that is being released, and the 
previous commit in the git history, and onward the hashes roll...

https://lore.kernel.org/git/xmqqh7n5zv2b.fsf@gitster.c.googlers.com/ is
a recent discussion on the refreshing of the PGP key. the post
https://lore.kernel.org

/git/YA3nwFcYz4tbhrlO@camp.crustytoothpaste.net/ in the thread notes
"The signature is .. over the uncompressed .tar ... You therefore need
to uncompress it first with gunzip."
>
> Thanks,
> -Todd
>
>
> PS: I know that sha1 has been declared “risky” by NIST and that folks should move away from it, and please be assured that we’re using sha256’s everywhere else.  Here I really just want to know whether cloning a git repo at a particular commit is as secure as downloading a tarball and checking it against a sha1, not whether or not sha1 is secure.
>
>
I don't think there is an obvious cross-check for the tarball sha1
comparison with the release tag's sha1, if that's the question.

The repeatability of tarballs has been discussed but I didn't find a
mail reference immediately.

Philip

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Commit SHA1 == SHA1 checksum?
  2022-02-06  0:22 ` Philip Oakley
@ 2022-02-06  9:00   ` Gamblin, Todd
  2022-02-06 10:23     ` Johannes Sixt
  2022-02-06 10:15   ` Junio C Hamano
  1 sibling, 1 reply; 19+ messages in thread
From: Gamblin, Todd @ 2022-02-06  9:00 UTC (permalink / raw)
  To: Philip Oakley; +Cc: git@vger.kernel.org

Thanks for the quick response.

> I don't think there is an obvious cross-check for the tarball sha1
> comparison with the release tag's sha1, if that's the question.

This is pretty much the question — does git do an integrity check on clone to verify that the commit hash (and its tree hash) are valid?  Does git verify objects as they’re written to disk when it’s cloning a repo?

> The tag contains the sha1 hash of the release commit, which in turn
> contains the sha1 hashes of the tree that is being released, and the 
> previous commit in the git history, and onward the hashes roll...

It seems like git fsck is supposed to check all of these, so would that be the potential analog? It seems like overkill if all I really want is the integrity of one commit snapshot.  Would it be sufficient to heck the hash of the checked out commit and then to check its tree hash… I guess I’m just curious why git doesn’t have a command that verifies the integrity of the current working tree against its commit sha1.

> /git/YA3nwFcYz4tbhrlO@camp.crustytoothpaste.net/ in the thread notes
> "The signature is .. over the uncompressed .tar ... You therefore need
> to uncompress it first with gunzip.”

This is a good point and would make it hard to tar up a repo into any verifiable single file for download, given only a commit sha1 for verification — gunzipping without the tarballs’ checksum is unsafe. I guess I would like to know if I can even verify the integrity of a checked out git repo by commit sha1 before I try to do a tarball or anything more complex.

Thanks again,
Todd




> On Feb 5, 2022, at 4:22 PM, Philip Oakley <philipoakley@iee.email> wrote:
> 
> On 05/02/2022 01:19, Gamblin, Todd wrote:
>> Apologies if this has been asked before, but the closest thing I could find was this thread:
>> 
>> 	https://urldefense.us/v3/__http://public-inbox.org/git/Pine.LNX.4.62.0504160519330.21837@qynat.qvtvafvgr.pbz/__;!!G2kpM7uM-TzIFchu!k7dRSHz8ms3qNYldu2HO6BZzvN91qqtPk7UXsmQzw3hgIQN33-9EdfLtzjN9XzGM1Q$ 
>> 
>> That thread devolved into a discussion of the security of different hashes and didn’t answer my question.
>> 
>> I want to know when and where git *guarantees* that the snapshot I have checked out has the checksum that git says it does, or if it does at all.
>> 
>> The use case for this is for package managers. I work on Spack (https://urldefense.us/v3/__http://github.com/spack/spack__;!!G2kpM7uM-TzIFchu!k7dRSHz8ms3qNYldu2HO6BZzvN91qqtPk7UXsmQzw3hgIQN33-9EdfLtzjPp-H3Ccg$  if you’re curious) and we download sources from tarballs and git repos (like many similar tools).  For tarballs we require a sha256, and we use it to verify the tarball after download.
>> 
>> For git repos, we would like to require a commit sha1, provided that it’s basically as secure as downloading a tarball and checking it against a known sha1.  So, if I `git clone` something, is the commit sha1 actually verified?
> For the Git releases, the maintainer, Junio, will PGP sign the release
> tag with his key e.g.
> https://urldefense.us/v3/__https://git.kernel.org/pub/scm/git/git.git/tag/?h=v2.35.1__;!!G2kpM7uM-TzIFchu!k7dRSHz8ms3qNYldu2HO6BZzvN91qqtPk7UXsmQzw3hgIQN33-9EdfLtzjNaNI59hw$ 
> 
> The tag contains the sha1 hash of the release commit, which in turn
> contains the sha1 hashes of the tree that is being released, and the 
> previous commit in the git history, and onward the hashes roll...
> 
> https://urldefense.us/v3/__https://lore.kernel.org/git/xmqqh7n5zv2b.fsf@gitster.c.googlers.com/__;!!G2kpM7uM-TzIFchu!k7dRSHz8ms3qNYldu2HO6BZzvN91qqtPk7UXsmQzw3hgIQN33-9EdfLtzjOPuUZJdQ$  is
> a recent discussion on the refreshing of the PGP key. the post
> https://urldefense.us/v3/__https://lore.kernel.org__;!!G2kpM7uM-TzIFchu!k7dRSHz8ms3qNYldu2HO6BZzvN91qqtPk7UXsmQzw3hgIQN33-9EdfLtzjPlWc_eSQ$ 
> 
> /git/YA3nwFcYz4tbhrlO@camp.crustytoothpaste.net/ in the thread notes
> "The signature is .. over the uncompressed .tar ... You therefore need
> to uncompress it first with gunzip."
>> 
>> Thanks,
>> -Todd
>> 
>> 
>> PS: I know that sha1 has been declared “risky” by NIST and that folks should move away from it, and please be assured that we’re using sha256’s everywhere else.  Here I really just want to know whether cloning a git repo at a particular commit is as secure as downloading a tarball and checking it against a sha1, not whether or not sha1 is secure.
>> 
>> 
> I don't think there is an obvious cross-check for the tarball sha1
> comparison with the release tag's sha1, if that's the question.
> 
> The repeatability of tarballs has been discussed but I didn't find a
> mail reference immediately.
> 
> Philip


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Commit SHA1 == SHA1 checksum?
  2022-02-06  0:22 ` Philip Oakley
  2022-02-06  9:00   ` Gamblin, Todd
@ 2022-02-06 10:15   ` Junio C Hamano
  2022-02-06 19:25     ` Philip Oakley
  1 sibling, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2022-02-06 10:15 UTC (permalink / raw)
  To: Philip Oakley; +Cc: Gamblin, Todd, git@vger.kernel.org

Philip Oakley <philipoakley@iee.email> writes:

> The tag contains the sha1 hash of the release commit, which in turn
> contains the sha1 hashes of the tree that is being released, and the 
> previous commit in the git history, and onward the hashes roll...

That's how a signed tag protects the commit it points at, and
everything reachable from it.  As much or as little trust you have
in SHA-1 in validating tarball.tar with its known SHA-1 checksum,
you can trust to the same degree that the commit that is pointed by
a tag is what the person who signed (with GPG) the tag wanted the
tag to point at, and in turn the trees and blobs in that commit are
what the signer wanted to have in that tagged commit, ad infinitum,
in the space dimension.  At the same time, a commit object records
the hash of the commit objects that are its parents, the whole
history of the project going back to inception can be trusted to the
same degree in the time dimension.

> https://lore.kernel.org/git/xmqqh7n5zv2b.fsf@gitster.c.googlers.com/ is
> a recent discussion on the refreshing of the PGP key. the post
> https://lore.kernel.org/git/YA3nwFcYz4tbhrlO@camp.crustytoothpaste.net/
> in the thread notes "The signature is .. over the uncompressed
> .tar ... You therefore need to uncompress it first with gunzip."

That thread has very little to do with the way how Git objects are
cryptographically protected, which I discussed earlier in this
message.

Instead, it was a discussion about how the checksum files and
tarballs at

    https://www.kernel.org/pub/software/scm/git/

relate to each other.  A typical release tarball for Git version
$VERSION has multiple tarballs git-$VERSION.tar.{gz,bz2,xz,...} and
they all uncompresses back to the same git-$VERSION.tar tarball.

There is git-$VERSION.tar.sign file next to them in the same
directory.  The file is supposed to contain a detached signature
over the uncompressed version of the archive.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Commit SHA1 == SHA1 checksum?
  2022-02-06  9:00   ` Gamblin, Todd
@ 2022-02-06 10:23     ` Johannes Sixt
  0 siblings, 0 replies; 19+ messages in thread
From: Johannes Sixt @ 2022-02-06 10:23 UTC (permalink / raw)
  To: Gamblin, Todd; +Cc: git@vger.kernel.org, Philip Oakley

Am 06.02.22 um 10:00 schrieb Gamblin, Todd:
>> I don't think there is an obvious cross-check for the tarball sha1
>> comparison with the release tag's sha1, if that's the question.
> 
> This is pretty much the question — does git do an integrity check on
> clone to verify that the commit hash (and its tree hash) are valid?
> Does git verify objects as they’re written to disk when it’s cloning a repo?

Yes, it does. (How could you even contemplate that it does not? It is
the most obvious way to protect the cloner.) Granted, `git clone` first
writes objects (actually, packs) to the disk before it checks them, but
I don't think that this detail is important.

>> The tag contains the sha1 hash of the release commit, which in turn
>> contains the sha1 hashes of the tree that is being released, and the 
>> previous commit in the git history, and onward the hashes roll...
> 
> It seems like git fsck is supposed to check all of these, so would
> that be the potential analog? It seems like overkill if all I really
> want is the integrity of one commit snapshot.  Would it be sufficient to
> heck the hash of the checked out commit and then to check its tree hash…
> I guess I’m just curious why git doesn’t have a command that verifies
> the integrity of the current working tree against its commit sha1.

That command is pretty much 'git status --ignored': if it shows anything
but "nothing to commit, working tree clean", your worktree is different
from the commit. The point is, there certainly are commands that let you
check a worktree against a commit.

-- Hannes

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Commit SHA1 == SHA1 checksum?
  2022-02-06 10:15   ` Junio C Hamano
@ 2022-02-06 19:25     ` Philip Oakley
  2022-02-06 20:02       ` Junio C Hamano
  0 siblings, 1 reply; 19+ messages in thread
From: Philip Oakley @ 2022-02-06 19:25 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Gamblin, Todd, git@vger.kernel.org

On 06/02/2022 10:15, Junio C Hamano wrote:
> Philip Oakley <philipoakley@iee.email> writes:
>
>> The tag contains the sha1 hash of the release commit, which in turn
>> contains the sha1 hashes of the tree that is being released, and the 
>> previous commit in the git history, and onward the hashes roll...
> That's how a signed tag protects the commit it points at, and
> everything reachable from it. 

I was highlighting to Todd that the tag has a PGP signature which
extends the web of trust to a source of external verification. While it
is implied by 'signed', it maybe hadn't jumped out what that meant for
verifying any clone used by Todd and LLNL.

I did notice that the GitHub page
[https://github.com/git/git/releases/tag/v2.35.1] merely provides the
pretty green-bordered 'Verified' icon (management style), but doesn't
show the signature, while the
[https://git.kernel.org/pub/scm/git/git.git/tag/?h=v2.35.1] does show
the signature (technocrat style) for direct inspection. I expect for the
`.gov` inspection, the latter give more immediate confidence before
their internal checks of the signature.

>  As much or as little trust you have
> in SHA-1 in validating tarball.tar with its known SHA-1 checksum,
> you can trust to the same degree that the commit that is pointed by
> a tag is what the person who signed (with GPG) the tag wanted the
> tag to point at, and in turn the trees and blobs in that commit are
> what the signer wanted to have in that tagged commit, ad infinitum,
> in the space dimension.  At the same time, a commit object records
> the hash of the commit objects that are its parents, the whole
> history of the project going back to inception can be trusted to the
> same degree in the time dimension.

Thanks, I'd perhaps over simplified my description.
>
>> https://lore.kernel.org/git/xmqqh7n5zv2b.fsf@gitster.c.googlers.com/ is
>> a recent discussion on the refreshing of the PGP key. the post
>> https://lore.kernel.org/git/YA3nwFcYz4tbhrlO@camp.crustytoothpaste.net/
>> in the thread notes "The signature is .. over the uncompressed
>> .tar ... You therefore need to uncompress it first with gunzip."
> That thread has very little to do with the way how Git objects are
> cryptographically protected, which I discussed earlier in this
> message.

My understanding was that Todd does use the tarballs, and wanted to see
how their signature/sha1 value related to the git clone tag and commit
sha1's.
>
> Instead, it was a discussion about how the checksum files and
> tarballs at
>
>     https://www.kernel.org/pub/software/scm/git/
>
> relate to each other.  A typical release tarball for Git version
> $VERSION has multiple tarballs git-$VERSION.tar.{gz,bz2,xz,...} and
> they all uncompresses back to the same git-$VERSION.tar tarball.

Reading the thread makes clear that the signature is after expansion. It
was that aspect that I was highlighting. As the thread indicates, not
everyone is aware of that.
>
> There is git-$VERSION.tar.sign file next to them in the same
> directory.  

Our announcements don't actually mention that. The usage of a sign file
isn't always immediately obvious, such as, the question
https://stackoverflow.com/questions/30699989/how-to-verify-the-integrity-of-a-linux-tarball.

> The file is supposed to contain a detached signature
> over the uncompressed version of the archive.
>
I think part of Todd's question was how the tag and uncompressed archive
'checksums' (e.g. hashes) relate to each other and where those
guarantees come from.

As you explained, the tag's PGP signature, and the contained sha1 hash
of the tip commit, recursively guarantee the full repo. But what it
doesn't have, as far as I understand, is a copy of the appropriate
`hash`/signature for the archive, which would provide the strong link
between the two different views of the release, dispelling any doubt
that they are of the same checked out release.

Hopefully Todd will be able to clarify if that 'archive vs tag' cross
check was part of the question, or whether it was primarily focussed on
the internally Git checks during for correctness during clone and fsck.

Philip

aside, for Todd, given the mention of sha256: The Git sha256 transition
plan is https://git-scm.com/docs/hash-function-transition/ and
experimental support is available.
 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Commit SHA1 == SHA1 checksum?
  2022-02-06 19:25     ` Philip Oakley
@ 2022-02-06 20:02       ` Junio C Hamano
  2022-02-06 21:33         ` Philip Oakley
  2022-02-07 13:32         ` Konstantin Ryabitsev
  0 siblings, 2 replies; 19+ messages in thread
From: Junio C Hamano @ 2022-02-06 20:02 UTC (permalink / raw)
  To: Philip Oakley; +Cc: Gamblin, Todd, git@vger.kernel.org

Philip Oakley <philipoakley@iee.email> writes:

> I think part of Todd's question was how the tag and uncompressed archive
> 'checksums' (e.g. hashes) relate to each other and where those
> guarantees come from.

There is no such linkage, and there are no guarantees.  The trust
you may or may not have on the PGP key that signs the tag and the
checksums of the tarball is the only source of such assurance.

More importantly, I do not think there can be any such linkage
between the Git tree and release tarball for a few fundamental
reasons:

 * We add generated files to "git archive" output when creating the
   release tarball for builder's convenience, so if you did

       rm -fr temp && git init temp
       tar Cxf temp git-$VERSION.tar
       git -C temp add . && git -C temp write-tree

   the tree object name that you get out of the last step will not
   match the tree object of the version from my archive (interested
   parties can study "make dist" for more details).

 * Even if we did not add any files to "git archive" output when
   creating a release tarball, a tarball that contains all the
   directories and files from a given git revision is *NOT* unique.
   We do not add randomness to the "git archive" output, just to
   make them unstable, but we have made fixes and improvements to
   the archive generation logic in the past, and we do reserve the
   rights to do so in the future.  And it is not just limited to
   "git archive" binary, but how it is driven, e.g. "tar.umask"
   settings can affect the mode bits.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Commit SHA1 == SHA1 checksum?
  2022-02-06 20:02       ` Junio C Hamano
@ 2022-02-06 21:33         ` Philip Oakley
  2022-02-07  8:15           ` Gamblin, Todd
  2022-02-07 13:32         ` Konstantin Ryabitsev
  1 sibling, 1 reply; 19+ messages in thread
From: Philip Oakley @ 2022-02-06 21:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Gamblin, Todd, git@vger.kernel.org

On 06/02/2022 20:02, Junio C Hamano wrote:
> Philip Oakley <philipoakley@iee.email> writes:
>
>> I think part of Todd's question was how the tag and uncompressed archive
>> 'checksums' (e.g. hashes) relate to each other and where those
>> guarantees come from.
> There is no such linkage, and there are no guarantees.  The trust
> you may or may not have on the PGP key that signs the tag and the
> checksums of the tarball is the only source of such assurance.
>
> More importantly, I do not think there can be any such linkage
> between the Git tree and release tarball for a few fundamental
> reasons:
>
>  * We add generated files to "git archive" output when creating the
>    release tarball for builder's convenience, so if you did
>
>        rm -fr temp && git init temp
>        tar Cxf temp git-$VERSION.tar
>        git -C temp add . && git -C temp write-tree
>
>    the tree object name that you get out of the last step will not
>    match the tree object of the version from my archive (interested
>    parties can study "make dist" for more details).
>
>  * Even if we did not add any files to "git archive" output when
>    creating a release tarball, a tarball that contains all the
>    directories and files from a given git revision is *NOT* unique.
>    We do not add randomness to the "git archive" output, just to
>    make them unstable, but we have made fixes and improvements to
>    the archive generation logic in the past, and we do reserve the
>    rights to do so in the future.  And it is not just limited to
>    "git archive" binary, but how it is driven, e.g. "tar.umask"
>    settings can affect the mode bits.
Thanks for the clarification.

Thus what trust their is, is via the two PGP signatures, rather than
directly between the tarball and the git repo.
--
Philip


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Commit SHA1 == SHA1 checksum?
  2022-02-06 21:33         ` Philip Oakley
@ 2022-02-07  8:15           ` Gamblin, Todd
  2022-02-07 13:15             ` Konstantin Ryabitsev
  0 siblings, 1 reply; 19+ messages in thread
From: Gamblin, Todd @ 2022-02-07  8:15 UTC (permalink / raw)
  To: Philip Oakley, Johannes Sixt, Junio C Hamano; +Cc: git@vger.kernel.org

Thanks everyone for the responses.  I think Junio’s comment summed up what I was asking nicely:

> As much or as little trust you have
> in SHA-1 in validating tarball.tar with its known SHA-1 checksum,
> you can trust to the same degree that the commit that is pointed by
> a tag is what the person who signed (with GPG) the tag wanted the
> tag to point at, and in turn the trees and blobs in that commit are
> what the signer wanted to have in that tagged commit, ad infinitum,
> in the space dimension.  At the same time, a commit object records
> the hash of the commit objects that are its parents, the whole
> history of the project going back to inception can be trusted to the
> same degree in the time dimension.

In our case, the initial trust doesn’t come from a PGP signature — it comes (at least for now) from having cloned the package repository from GitHub. So you trust GitHub by cloning over https, and you trust the Spack maintainers to only merge safe things into `develop` or some release branch.  A package description in the repo might have some versions specified by commit, like this:

	https://raw.githubusercontent.com/spack/spack/5ff72ca/var/spack/repos/builtin/packages/acts/package.py

We use the specified commits to build packages from source, or to create (hopefully) reproducible binary packages. Anyway, like the PGP case, you’re given a commit hash from some trusted source.  The question was really whether you can rely on the hash like a sha1 checksum — and it seems like you can.

That said, I guess I do still have one more question — how soon will git notice that a given repo is corrupted/tampered with (insofar as sha1 can do that)?  On checkout?

RE: Johannes:
> (How could you even contemplate that it does not? It is the most obvious way to protect the cloner.)

I have always assumed this was the case but never could find anything in the docs saying explicitly what Junio said above.  It is hard for me to imagine git *not* working this way, but I’ve been asked this question by enough of our package maintainers that I thought I’d bring it up here.

RE: Phil:

> Hopefully Todd will be able to clarify if that 'archive vs tag' cross
> check was part of the question, or whether it was primarily focussed on
> the internally Git checks during for correctness during clone and fsck.

It wasn’t part of the original question — I was really just asking whether git guarantees that a fresh `git clone` of some commit actually has the stated commit hash.  I realize there’s no relation between the sha1 of a commit and the sha1 or any other hash of its tarball (it’d be a pretty bad hash function if there was).

That said, we are still trying to work out some practical *and secure* way to mirror git commits as a simple download.  I think we need to generate the tarballs ourselves and just add their sha256’s to the package — GitHub does this, and their archive generation logic has changed in the past as Junio described below.  It’s messy b/c it requires another checksum that may change, but I don’t see a way around it.  We can’t just tar up a git repo - tar and other compression tools can have vulnerabilities and we want to checksum any input we pass to them.

Thanks again for all the helpful responses.

-Todd



> On Feb 6, 2022, at 1:33 PM, Philip Oakley <philipoakley@iee.email> wrote:
> 
> On 06/02/2022 20:02, Junio C Hamano wrote:
>> Philip Oakley <philipoakley@iee.email> writes:
>> 
>>> I think part of Todd's question was how the tag and uncompressed archive
>>> 'checksums' (e.g. hashes) relate to each other and where those
>>> guarantees come from.
>> There is no such linkage, and there are no guarantees.  The trust
>> you may or may not have on the PGP key that signs the tag and the
>> checksums of the tarball is the only source of such assurance.
>> 
>> More importantly, I do not think there can be any such linkage
>> between the Git tree and release tarball for a few fundamental
>> reasons:
>> 
>> * We add generated files to "git archive" output when creating the
>>   release tarball for builder's convenience, so if you did
>> 
>>       rm -fr temp && git init temp
>>       tar Cxf temp git-$VERSION.tar
>>       git -C temp add . && git -C temp write-tree
>> 
>>   the tree object name that you get out of the last step will not
>>   match the tree object of the version from my archive (interested
>>   parties can study "make dist" for more details).
>> 
>> * Even if we did not add any files to "git archive" output when
>>   creating a release tarball, a tarball that contains all the
>>   directories and files from a given git revision is *NOT* unique.
>>   We do not add randomness to the "git archive" output, just to
>>   make them unstable, but we have made fixes and improvements to
>>   the archive generation logic in the past, and we do reserve the
>>   rights to do so in the future.  And it is not just limited to
>>   "git archive" binary, but how it is driven, e.g. "tar.umask"
>>   settings can affect the mode bits.
> Thanks for the clarification.
> 
> Thus what trust their is, is via the two PGP signatures, rather than
> directly between the tarball and the git repo.
> --
> Philip
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Commit SHA1 == SHA1 checksum?
  2022-02-07  8:15           ` Gamblin, Todd
@ 2022-02-07 13:15             ` Konstantin Ryabitsev
  2022-02-07 21:08               ` Gamblin, Todd
  0 siblings, 1 reply; 19+ messages in thread
From: Konstantin Ryabitsev @ 2022-02-07 13:15 UTC (permalink / raw)
  To: Gamblin, Todd
  Cc: Philip Oakley, Johannes Sixt, Junio C Hamano, git@vger.kernel.org

On Mon, Feb 07, 2022 at 08:15:58AM +0000, Gamblin, Todd wrote:
> In our case, the initial trust doesn’t come from a PGP signature — it comes
> (at least for now) from having cloned the package repository from GitHub.

Not really the case, if you're relying on a particular commit hash, as you
state. Once you specify a target hash, you don't really have to care where the
repository came from -- the hash is either going to be there and be valid, or
it's not going to be there.

It only matters where the person who picked that hash cloned the repository
from and what steps they made to verify that it is a legitimate commit. If "I
cloned this repository from github" is sufficient for your needs, then that's
fine. The alternative is to use PGP verification, but in either case once you
pick a hash to use, you can rely on git to do all the rest.

> That said, I guess I do still have one more question — how soon will git
> notice that a given repo is corrupted/tampered with (insofar as sha1 can do
> that)?  On checkout?

Yes. I've asked this question before as well:
https://lore.kernel.org/git/20190829141010.GD1797@sigill.intra.peff.net/

The relevant bit:

    Then yes, there is no need to fsck. When the objects were received on
    the server side (by push) and then again when you got them from the
    server (by clone), their sha1s were recomputed from scratch, not
    trusting the sender at all in either case.

-K

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Commit SHA1 == SHA1 checksum?
  2022-02-06 20:02       ` Junio C Hamano
  2022-02-06 21:33         ` Philip Oakley
@ 2022-02-07 13:32         ` Konstantin Ryabitsev
  2022-02-07 20:57           ` Junio C Hamano
  1 sibling, 1 reply; 19+ messages in thread
From: Konstantin Ryabitsev @ 2022-02-07 13:32 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Philip Oakley, Gamblin, Todd, git@vger.kernel.org

On Sun, Feb 06, 2022 at 12:02:34PM -0800, Junio C Hamano wrote:
> Philip Oakley <philipoakley@iee.email> writes:
> 
> > I think part of Todd's question was how the tag and uncompressed archive
> > 'checksums' (e.g. hashes) relate to each other and where those
> > guarantees come from.
> 
> There is no such linkage, and there are no guarantees.  The trust
> you may or may not have on the PGP key that signs the tag and the
> checksums of the tarball is the only source of such assurance.
> 
> More importantly, I do not think there can be any such linkage
> between the Git tree and release tarball:

Hmm... I've actually considered writing a tool that would verify whether a
tarball corresponds to a signed tag/commit. It should be entirely possible,
no?

1. list all the files in the tar archive and calculate their hashes
2. use that info to rebuild blob and tree hashes all the way to the top of
   the archive tree
3. verify the signature on the git tag and obtain the corresponding tree hash
4. compare the in-git tree hash with your calculated archive tree hash

It would be slow, but it would be perfectly workable. I didn't end up writing
that tool mostly because if you already have a git tree to run "git
verify-tag", then you might as well just run "git archive". :)

The only cases where doing the above would make sense would be specifically
for forensic/verification purposes.

-K

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Commit SHA1 == SHA1 checksum?
  2022-02-07 13:32         ` Konstantin Ryabitsev
@ 2022-02-07 20:57           ` Junio C Hamano
  2022-02-07 21:34             ` Konstantin Ryabitsev
  0 siblings, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2022-02-07 20:57 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: Philip Oakley, Gamblin, Todd, git@vger.kernel.org

Konstantin Ryabitsev <konstantin@linuxfoundation.org> writes:

> On Sun, Feb 06, 2022 at 12:02:34PM -0800, Junio C Hamano wrote:
>> Philip Oakley <philipoakley@iee.email> writes:
>> 
>> > I think part of Todd's question was how the tag and uncompressed archive
>> > 'checksums' (e.g. hashes) relate to each other and where those
>> > guarantees come from.
>> 
>> There is no such linkage, and there are no guarantees.  The trust
>> you may or may not have on the PGP key that signs the tag and the
>> checksums of the tarball is the only source of such assurance.
>> 
>> More importantly, I do not think there can be any such linkage
>> between the Git tree and release tarball:
>
> Hmm... I've actually considered writing a tool that would verify whether a
> tarball corresponds to a signed tag/commit. It should be entirely possible,
> no?

I was saying "I have this git commit (or tree) object name.  Compute
the hash for a .tar archive that would contain the contents of that
tree." has no unique answer.

You are solving a different problem: "I have this tar archive; what
git tree object would I get if I extract this archive to an empty
directory and said 'git add . && git write-tree'?".

I agree that one is computable.

Of course, even that reverse problem will break once we consider the
release tarball generation procedure where we _add_ some generated
files that are not in the Git tree, for builder's convenience.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Commit SHA1 == SHA1 checksum?
  2022-02-07 13:15             ` Konstantin Ryabitsev
@ 2022-02-07 21:08               ` Gamblin, Todd
  0 siblings, 0 replies; 19+ messages in thread
From: Gamblin, Todd @ 2022-02-07 21:08 UTC (permalink / raw)
  To: Konstantin Ryabitsev
  Cc: Philip Oakley, Johannes Sixt, Junio C Hamano, git@vger.kernel.org

> On Mon, Feb 07, 2022 at 08:15:58AM +0000, Gamblin, Todd wrote:
>> In our case, the initial trust doesn’t come from a PGP signature — it comes
>> (at least for now) from having cloned the package repository from GitHub.
> 
> Not really the case, if you're relying on a particular commit hash, as you
> state. Once you specify a target hash, you don't really have to care where the
> repository came from -- the hash is either going to be there and be valid, or
> it's not going to be there.

Not to belabor the point, as I think we agree, but there are two clones going on in my example:

1. Spack the package manager is hosted on GitHub.  You clone the repository and run bin/spack out of the repository to use it.  Users will clone either `develop` (the default branch) or some release branch — but they won’t have a commit hash for that.  This is just how they get the package manager and its built-in package repo in the first place.

2. In the spack repo is a repository full of package descriptions.  Those point to sources for things spack can build, and they may do it by commit hash or by tarball URL and sha256.  If spack sees a source listed by commit hash, spack clones it (at that hash) before building.

In (1), since you do not have a hash, you’re trusting that GitHub gave you the right repo and that the project maintained its branches well.  This is why I called it “initial trust”.  In (2), that trust enables you to have confidence in the hashes in the package.py files.

I think we both agree that if you have a sha1 hash from a trusted source, you can be assured that it’s accurate, regardless of where the repo came from.

-Todd







> On Feb 7, 2022, at 5:15 AM, Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> 
> On Mon, Feb 07, 2022 at 08:15:58AM +0000, Gamblin, Todd wrote:
>> In our case, the initial trust doesn’t come from a PGP signature — it comes
>> (at least for now) from having cloned the package repository from GitHub.
> 
> Not really the case, if you're relying on a particular commit hash, as you
> state. Once you specify a target hash, you don't really have to care where the
> repository came from -- the hash is either going to be there and be valid, or
> it's not going to be there.
> 
> It only matters where the person who picked that hash cloned the repository
> from and what steps they made to verify that it is a legitimate commit. If "I
> cloned this repository from github" is sufficient for your needs, then that's
> fine. The alternative is to use PGP verification, but in either case once you
> pick a hash to use, you can rely on git to do all the rest.
> 
>> That said, I guess I do still have one more question — how soon will git
>> notice that a given repo is corrupted/tampered with (insofar as sha1 can do
>> that)?  On checkout?
> 
> Yes. I've asked this question before as well:
> https://urldefense.us/v3/__https://lore.kernel.org/git/20190829141010.GD1797@sigill.intra.peff.net/__;!!G2kpM7uM-TzIFchu!gApKmh4RAQ8zueDlHDnRzHBmKpn03CSH9WvjgAk6C4tBa5ZJMwR8GBuro5lsth0vMg$ 
> 
> The relevant bit:
> 
>    Then yes, there is no need to fsck. When the objects were received on
>    the server side (by push) and then again when you got them from the
>    server (by clone), their sha1s were recomputed from scratch, not
>    trusting the sender at all in either case.
> 
> -K


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Commit SHA1 == SHA1 checksum?
  2022-02-07 20:57           ` Junio C Hamano
@ 2022-02-07 21:34             ` Konstantin Ryabitsev
  2022-02-07 22:29               ` Gamblin, Todd
  2022-02-07 22:49               ` Junio C Hamano
  0 siblings, 2 replies; 19+ messages in thread
From: Konstantin Ryabitsev @ 2022-02-07 21:34 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Philip Oakley, Gamblin, Todd, git@vger.kernel.org

On Mon, Feb 07, 2022 at 12:57:55PM -0800, Junio C Hamano wrote:
> You are solving a different problem: "I have this tar archive; what
> git tree object would I get if I extract this archive to an empty
> directory and said 'git add . && git write-tree'?".
> 
> I agree that one is computable.

So, I was brainstorming about this today, and I'm curious if you think this
would be a useful feature to have, maybe even natively?

E.g. here's a scenario:

"git archive -S <signed-object>" creates an additional file that is added to
the generated tar/zip archive -- for example, a ${prefix}.GIT_ARCHIVE_SIG. That
file contains the raw contents of the signed tag and/or the signed commit.

"git verify-archive" would look for a toplevel .GIT_ARCHIVE_SIG file. If it's
present, it would verify the signature on these "detached" signed objects to
get a trusted tree hash. Then it would compute the tree hash of the tar
archive (minus the .GIT_ARCHIVE_SIG file) to see if it matches.

In my mind, that would provide the following benefits over the current
practice of detached .sig files:

1. environments like github/git.kernel.org would be able to create verifiable
   snapshot archives using an existing set of signed objects
2. packagers would be able to perform cryptographic verification without
   needing to track any extra sources like corresponding .sig files; they
   would just need to add a build-time dependency on git (plus whatever it
   calls for cryptographic verification, such as gnupg or openssh)
3. this would automatically support all git-native signature mechanisms like
   openssh and whatever else gets added in the future

Does this idea have any merit, or is it too fragile/crazy to bother?

> Of course, even that reverse problem will break once we consider the
> release tarball generation procedure where we _add_ some generated
> files that are not in the Git tree, for builder's convenience.

Yes, but it's increasingly rare and many build instructions now specifically
allow for things like "first, run autoconf if you don't already have a
configure file", etc.

-K

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Commit SHA1 == SHA1 checksum?
  2022-02-07 21:34             ` Konstantin Ryabitsev
@ 2022-02-07 22:29               ` Gamblin, Todd
  2022-02-07 22:46                 ` Konstantin Ryabitsev
  2022-02-07 22:49               ` Junio C Hamano
  1 sibling, 1 reply; 19+ messages in thread
From: Gamblin, Todd @ 2022-02-07 22:29 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: Junio C Hamano, Philip Oakley, git@vger.kernel.org

> 2. packagers would be able to perform cryptographic verification without
>   needing to track any extra sources like corresponding .sig files; they
>   would just need to add a build-time dependency on git (plus whatever it
>   calls for cryptographic verification, such as gnupg or openssh)

This is a cool idea, but tar/gzip/etc. are vulnerable to input attacks (or at least there have been CVEs in the past), so this does not eliminate the need to verify a downloaded .tar or .tar.gz file independently.  You can verify the contents of the tar, but to do that you have to expand it, and to do that you’re still passing untrusted input to tar.

It would be really interesting if it were possible to create an incrementally verifiable file format for this — commit messages have sizes embedded so it seems possible to string a bunch of commits/trees/blobs together in a file and to make that work.  I’m just not sure if the format could be compressed and also trusted.   I don’t know enough details of git’s own storage to say whether, e.g., pack files could help.

-Todd



> On Feb 7, 2022, at 1:34 PM, Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> 
> On Mon, Feb 07, 2022 at 12:57:55PM -0800, Junio C Hamano wrote:
>> You are solving a different problem: "I have this tar archive; what
>> git tree object would I get if I extract this archive to an empty
>> directory and said 'git add . && git write-tree'?".
>> 
>> I agree that one is computable.
> 
> So, I was brainstorming about this today, and I'm curious if you think this
> would be a useful feature to have, maybe even natively?
> 
> E.g. here's a scenario:
> 
> "git archive -S <signed-object>" creates an additional file that is added to
> the generated tar/zip archive -- for example, a ${prefix}.GIT_ARCHIVE_SIG. That
> file contains the raw contents of the signed tag and/or the signed commit.
> 
> "git verify-archive" would look for a toplevel .GIT_ARCHIVE_SIG file. If it's
> present, it would verify the signature on these "detached" signed objects to
> get a trusted tree hash. Then it would compute the tree hash of the tar
> archive (minus the .GIT_ARCHIVE_SIG file) to see if it matches.
> 
> In my mind, that would provide the following benefits over the current
> practice of detached .sig files:
> 
> 1. environments like github/git.kernel.org would be able to create verifiable
>   snapshot archives using an existing set of signed objects
> 2. packagers would be able to perform cryptographic verification without
>   needing to track any extra sources like corresponding .sig files; they
>   would just need to add a build-time dependency on git (plus whatever it
>   calls for cryptographic verification, such as gnupg or openssh)
> 3. this would automatically support all git-native signature mechanisms like
>   openssh and whatever else gets added in the future
> 
> Does this idea have any merit, or is it too fragile/crazy to bother?
> 
>> Of course, even that reverse problem will break once we consider the
>> release tarball generation procedure where we _add_ some generated
>> files that are not in the Git tree, for builder's convenience.
> 
> Yes, but it's increasingly rare and many build instructions now specifically
> allow for things like "first, run autoconf if you don't already have a
> configure file", etc.
> 
> -K


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Commit SHA1 == SHA1 checksum?
  2022-02-07 22:29               ` Gamblin, Todd
@ 2022-02-07 22:46                 ` Konstantin Ryabitsev
  2022-02-08  6:23                   ` Gamblin, Todd
  0 siblings, 1 reply; 19+ messages in thread
From: Konstantin Ryabitsev @ 2022-02-07 22:46 UTC (permalink / raw)
  To: Gamblin, Todd; +Cc: Junio C Hamano, Philip Oakley, git@vger.kernel.org

On Mon, Feb 07, 2022 at 10:29:37PM +0000, Gamblin, Todd wrote:
> > 2. packagers would be able to perform cryptographic verification without
> >   needing to track any extra sources like corresponding .sig files; they
> >   would just need to add a build-time dependency on git (plus whatever it
> >   calls for cryptographic verification, such as gnupg or openssh)
> 
> This is a cool idea, but tar/gzip/etc. are vulnerable to input attacks (or
> at least there have been CVEs in the past), so this does not eliminate the
> need to verify a downloaded .tar or .tar.gz file independently.  You can
> verify the contents of the tar, but to do that you have to expand it, and to
> do that you’re still passing untrusted input to tar.

That's not really different from what git does when it clones a remote
repository to run "git verify-tag". It still accepts untrusted input from the
remote server, performs a lot of compression/decompression operations, etc, so
this is not introducing anything that git isn't already required to do.

I know there's a lot to be said about the simplicity of just computing a
signature over file bytes, but there are features you end up sacrificing, such
as ability to provide a single signature for multiple compression types,
adding a better compression algorithm in the future, or simply recompressing
with better flags in a long background process.

My goal is to improve the current situation where we're actually doing pretty
good for signed in-git objects, but none of that is carried over to packaging
systems. The only effort I know in that area is sigstore, but it requires
quite a bit of work to properly use on the part of the project maintainer,
whereas it would be great to be able to say "just do git tag -s and the
packaging systems will be able to use that."

-K

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Commit SHA1 == SHA1 checksum?
  2022-02-07 21:34             ` Konstantin Ryabitsev
  2022-02-07 22:29               ` Gamblin, Todd
@ 2022-02-07 22:49               ` Junio C Hamano
  2022-02-07 23:02                 ` Konstantin Ryabitsev
  1 sibling, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2022-02-07 22:49 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: Philip Oakley, Gamblin, Todd, git@vger.kernel.org

Konstantin Ryabitsev <konstantin@linuxfoundation.org> writes:

> On Mon, Feb 07, 2022 at 12:57:55PM -0800, Junio C Hamano wrote:
>> You are solving a different problem: "I have this tar archive; what
>> git tree object would I get if I extract this archive to an empty
>> directory and said 'git add . && git write-tree'?".
>> 
>> I agree that one is computable.
>
> So, I was brainstorming about this today, and I'm curious if you think this
> would be a useful feature to have, maybe even natively?
>
> E.g. here's a scenario:
>
> "git archive -S <signed-object>" creates an additional file that is added to
> the generated tar/zip archive -- for example, a ${prefix}.GIT_ARCHIVE_SIG. That
> file contains the raw contents of the signed tag and/or the signed commit.
>
> "git verify-archive" would look for a toplevel .GIT_ARCHIVE_SIG file. If it's
> present, it would verify the signature on these "detached" signed objects to
> get a trusted tree hash. Then it would compute the tree hash of the tar
> archive (minus the .GIT_ARCHIVE_SIG file) to see if it matches.
>
> In my mind, that would provide the following benefits over the current
> practice of detached .sig files:
>
> 1. environments like github/git.kernel.org would be able to create verifiable
>    snapshot archives using an existing set of signed objects
> 2. packagers would be able to perform cryptographic verification without
>    needing to track any extra sources like corresponding .sig files; they
>    would just need to add a build-time dependency on git (plus whatever it
>    calls for cryptographic verification, such as gnupg or openssh)
> 3. this would automatically support all git-native signature mechanisms like
>    openssh and whatever else gets added in the future
>
> Does this idea have any merit, or is it too fragile/crazy to bother?

I may choose details differently at implementation level (instead of
an extra file, I'd see if we can add it as pax_extended_header, for
example), but I think that is workable and might be even useful,
provided if I am not misunderstanding your idea, so let me try
rephrasing to see how it would work.

Given a signed commit or a signed tag that points at a commit, your
enhanced "git archive" would create a .tar file with the contents of
the tree object, and adds copies signed objects that tells what tree
object the archive ought to have.  E.g. if you start from a signed
tag, "git cat-file tag $tag" output would allow you to learn the
object name of the tagged object, and to verify the PGP signature
embedded in the tag, but it is likely that the tagged object is a
commit, not a tree, so you'd also need to include "git cat-file
commit $tag^{commit}".  So you'd store the raw contents of the tag
(so that we have a hash-protected record of commit object name), and
the commit (so that we have a hash-protected record of tree object
name).

You as the recipient will find these in the tarball:

 - the files that are supposed to be the contents of tree X.

 - the raw contents of the commit C that is supposed to record the
   tree X.

 - the raw contents of the tag T that is supposed to point at the
   commit C.

Starting from the contents of tag T, which is PGP signed, you know
that the signer wanted to call commit C with the name of the tag T.
Then the raw contents that alledgedly are from commit C, you can
"git hash-object -t commit" it to verify that it indeed hashes down
to C (hence, it what the signer wanted to give you), and find the
name of the tree object X the commit records.  And when you added
all the blobs contained in the tarball (and nothing else) to the
index and ran write-tree on the resulting index, you would know what
tree object the tarball contained, and if it hashes down to X, you
know that the cryptographic hash chain starting from PGP signature
on T attests that that tarball matches what the signer wanted you
to have.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Commit SHA1 == SHA1 checksum?
  2022-02-07 22:49               ` Junio C Hamano
@ 2022-02-07 23:02                 ` Konstantin Ryabitsev
  0 siblings, 0 replies; 19+ messages in thread
From: Konstantin Ryabitsev @ 2022-02-07 23:02 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Philip Oakley, Gamblin, Todd, git@vger.kernel.org

On Mon, Feb 07, 2022 at 02:49:16PM -0800, Junio C Hamano wrote:
> Given a signed commit or a signed tag that points at a commit, your
> enhanced "git archive" would create a .tar file with the contents of
> the tree object, and adds copies signed objects that tells what tree
> object the archive ought to have.  E.g. if you start from a signed
> tag, "git cat-file tag $tag" output would allow you to learn the
> object name of the tagged object, and to verify the PGP signature
> embedded in the tag, but it is likely that the tagged object is a
> commit, not a tree, so you'd also need to include "git cat-file
> commit $tag^{commit}". 

Correct, unless it's a snapshot of a signed commit, not of a tag (cgit, for
example, makes them available for download). In this case we only need to have
the cat-file contents of the commit.

> So you'd store the raw contents of the tag
> (so that we have a hash-protected record of commit object name), and
> the commit (so that we have a hash-protected record of tree object
> name).
> 
> You as the recipient will find these in the tarball:
> 
>  - the files that are supposed to be the contents of tree X.
> 
>  - the raw contents of the commit C that is supposed to record the
>    tree X.
> 
>  - the raw contents of the tag T that is supposed to point at the
>    commit C.
> 
> Starting from the contents of tag T, which is PGP signed, you know
> that the signer wanted to call commit C with the name of the tag T.
> Then the raw contents that alledgedly are from commit C, you can
> "git hash-object -t commit" it to verify that it indeed hashes down
> to C (hence, it what the signer wanted to give you), and find the
> name of the tree object X the commit records.  And when you added
> all the blobs contained in the tarball (and nothing else) to the
> index and ran write-tree on the resulting index, you would know what
> tree object the tarball contained, and if it hashes down to X, you
> know that the cryptographic hash chain starting from PGP signature
> on T attests that that tarball matches what the signer wanted you
> to have.

Exactly right. It would be slightly more complicated for things like openssh
signatures, since then you have to worry about where the allowed_signers file
comes from, but these are implementation minutae. Even if we start with just
support for PGP signatures, that would already be a great improvement over
where things are with snapshot downloads right now.

Best regards,
-K

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Commit SHA1 == SHA1 checksum?
  2022-02-07 22:46                 ` Konstantin Ryabitsev
@ 2022-02-08  6:23                   ` Gamblin, Todd
  0 siblings, 0 replies; 19+ messages in thread
From: Gamblin, Todd @ 2022-02-08  6:23 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: Junio C Hamano, Philip Oakley, git@vger.kernel.org



> On Feb 7, 2022, at 2:46 PM, Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> 
> On Mon, Feb 07, 2022 at 10:29:37PM +0000, Gamblin, Todd wrote:
>>> 2. packagers would be able to perform cryptographic verification without
>>>  needing to track any extra sources like corresponding .sig files; they
>>>  would just need to add a build-time dependency on git (plus whatever it
>>>  calls for cryptographic verification, such as gnupg or openssh)
>> 
>> This is a cool idea, but tar/gzip/etc. are vulnerable to input attacks (or
>> at least there have been CVEs in the past), so this does not eliminate the
>> need to verify a downloaded .tar or .tar.gz file independently.  You can
>> verify the contents of the tar, but to do that you have to expand it, and to
>> do that you’re still passing untrusted input to tar.
> 
> That's not really different from what git does when it clones a remote
> repository to run "git verify-tag". It still accepts untrusted input from the
> remote server, performs a lot of compression/decompression operations, etc, so
> this is not introducing anything that git isn't already required to do.
> 
> I know there's a lot to be said about the simplicity of just computing a
> signature over file bytes, but there are features you end up sacrificing, such
> as ability to provide a single signature for multiple compression types,
> adding a better compression algorithm in the future, or simply recompressing
> with better flags in a long background process.
> 
> My goal is to improve the current situation where we're actually doing pretty
> good for signed in-git objects, but none of that is carried over to packaging
> systems. The only effort I know in that area is sigstore, but it requires
> quite a bit of work to properly use on the part of the project maintainer,
> whereas it would be great to be able to say "just do git tag -s and the
> packaging systems will be able to use that.”

I agree this would be really nice.  If the tarball (or whatever) created could also be signed, so that it could be trusted regardless of the particular server you fetch it from, it might work as a nice packaging format.  Maybe you could have a signed header with the hash of the tarball that’s produced?  You wouldn’t really need a signed tag in that case.

Our use for this would be to host these signed git archives (.gar files?) on mirrors — which we may or may not trust.  If I can get around the hostile tarball issue I’d be super excited about the idea.

-Todd



^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2022-02-08  6:24 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-05  1:19 Commit SHA1 == SHA1 checksum? Gamblin, Todd
2022-02-06  0:22 ` Philip Oakley
2022-02-06  9:00   ` Gamblin, Todd
2022-02-06 10:23     ` Johannes Sixt
2022-02-06 10:15   ` Junio C Hamano
2022-02-06 19:25     ` Philip Oakley
2022-02-06 20:02       ` Junio C Hamano
2022-02-06 21:33         ` Philip Oakley
2022-02-07  8:15           ` Gamblin, Todd
2022-02-07 13:15             ` Konstantin Ryabitsev
2022-02-07 21:08               ` Gamblin, Todd
2022-02-07 13:32         ` Konstantin Ryabitsev
2022-02-07 20:57           ` Junio C Hamano
2022-02-07 21:34             ` Konstantin Ryabitsev
2022-02-07 22:29               ` Gamblin, Todd
2022-02-07 22:46                 ` Konstantin Ryabitsev
2022-02-08  6:23                   ` Gamblin, Todd
2022-02-07 22:49               ` Junio C Hamano
2022-02-07 23:02                 ` Konstantin Ryabitsev

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).