git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Gamblin, Todd" <gamblin2@llnl.gov>
To: Philip Oakley <philipoakley@iee.email>,
	Johannes Sixt <j6t@kdbg.org>, Junio C Hamano <gitster@pobox.com>
Cc: "git@vger.kernel.org" <git@vger.kernel.org>
Subject: Re: Commit SHA1 == SHA1 checksum?
Date: Mon, 7 Feb 2022 08:15:58 +0000	[thread overview]
Message-ID: <734DE4FA-E4B1-43CB-89EE-3C200FA21F4F@llnl.gov> (raw)
In-Reply-To: <13f21a57-1519-5ace-30e8-def598fad38b@iee.email>

Thanks everyone for the responses.  I think Junio’s comment summed up what I was asking nicely:

> As much or as little trust you have
> in SHA-1 in validating tarball.tar with its known SHA-1 checksum,
> you can trust to the same degree that the commit that is pointed by
> a tag is what the person who signed (with GPG) the tag wanted the
> tag to point at, and in turn the trees and blobs in that commit are
> what the signer wanted to have in that tagged commit, ad infinitum,
> in the space dimension.  At the same time, a commit object records
> the hash of the commit objects that are its parents, the whole
> history of the project going back to inception can be trusted to the
> same degree in the time dimension.

In our case, the initial trust doesn’t come from a PGP signature — it comes (at least for now) from having cloned the package repository from GitHub. So you trust GitHub by cloning over https, and you trust the Spack maintainers to only merge safe things into `develop` or some release branch.  A package description in the repo might have some versions specified by commit, like this:

	https://raw.githubusercontent.com/spack/spack/5ff72ca/var/spack/repos/builtin/packages/acts/package.py

We use the specified commits to build packages from source, or to create (hopefully) reproducible binary packages. Anyway, like the PGP case, you’re given a commit hash from some trusted source.  The question was really whether you can rely on the hash like a sha1 checksum — and it seems like you can.

That said, I guess I do still have one more question — how soon will git notice that a given repo is corrupted/tampered with (insofar as sha1 can do that)?  On checkout?

RE: Johannes:
> (How could you even contemplate that it does not? It is the most obvious way to protect the cloner.)

I have always assumed this was the case but never could find anything in the docs saying explicitly what Junio said above.  It is hard for me to imagine git *not* working this way, but I’ve been asked this question by enough of our package maintainers that I thought I’d bring it up here.

RE: Phil:

> Hopefully Todd will be able to clarify if that 'archive vs tag' cross
> check was part of the question, or whether it was primarily focussed on
> the internally Git checks during for correctness during clone and fsck.

It wasn’t part of the original question — I was really just asking whether git guarantees that a fresh `git clone` of some commit actually has the stated commit hash.  I realize there’s no relation between the sha1 of a commit and the sha1 or any other hash of its tarball (it’d be a pretty bad hash function if there was).

That said, we are still trying to work out some practical *and secure* way to mirror git commits as a simple download.  I think we need to generate the tarballs ourselves and just add their sha256’s to the package — GitHub does this, and their archive generation logic has changed in the past as Junio described below.  It’s messy b/c it requires another checksum that may change, but I don’t see a way around it.  We can’t just tar up a git repo - tar and other compression tools can have vulnerabilities and we want to checksum any input we pass to them.

Thanks again for all the helpful responses.

-Todd



> On Feb 6, 2022, at 1:33 PM, Philip Oakley <philipoakley@iee.email> wrote:
> 
> On 06/02/2022 20:02, Junio C Hamano wrote:
>> Philip Oakley <philipoakley@iee.email> writes:
>> 
>>> I think part of Todd's question was how the tag and uncompressed archive
>>> 'checksums' (e.g. hashes) relate to each other and where those
>>> guarantees come from.
>> There is no such linkage, and there are no guarantees.  The trust
>> you may or may not have on the PGP key that signs the tag and the
>> checksums of the tarball is the only source of such assurance.
>> 
>> More importantly, I do not think there can be any such linkage
>> between the Git tree and release tarball for a few fundamental
>> reasons:
>> 
>> * We add generated files to "git archive" output when creating the
>>   release tarball for builder's convenience, so if you did
>> 
>>       rm -fr temp && git init temp
>>       tar Cxf temp git-$VERSION.tar
>>       git -C temp add . && git -C temp write-tree
>> 
>>   the tree object name that you get out of the last step will not
>>   match the tree object of the version from my archive (interested
>>   parties can study "make dist" for more details).
>> 
>> * Even if we did not add any files to "git archive" output when
>>   creating a release tarball, a tarball that contains all the
>>   directories and files from a given git revision is *NOT* unique.
>>   We do not add randomness to the "git archive" output, just to
>>   make them unstable, but we have made fixes and improvements to
>>   the archive generation logic in the past, and we do reserve the
>>   rights to do so in the future.  And it is not just limited to
>>   "git archive" binary, but how it is driven, e.g. "tar.umask"
>>   settings can affect the mode bits.
> Thanks for the clarification.
> 
> Thus what trust their is, is via the two PGP signatures, rather than
> directly between the tarball and the git repo.
> --
> Philip
> 


  reply	other threads:[~2022-02-07  8:31 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-05  1:19 Commit SHA1 == SHA1 checksum? Gamblin, Todd
2022-02-06  0:22 ` Philip Oakley
2022-02-06  9:00   ` Gamblin, Todd
2022-02-06 10:23     ` Johannes Sixt
2022-02-06 10:15   ` Junio C Hamano
2022-02-06 19:25     ` Philip Oakley
2022-02-06 20:02       ` Junio C Hamano
2022-02-06 21:33         ` Philip Oakley
2022-02-07  8:15           ` Gamblin, Todd [this message]
2022-02-07 13:15             ` Konstantin Ryabitsev
2022-02-07 21:08               ` Gamblin, Todd
2022-02-07 13:32         ` Konstantin Ryabitsev
2022-02-07 20:57           ` Junio C Hamano
2022-02-07 21:34             ` Konstantin Ryabitsev
2022-02-07 22:29               ` Gamblin, Todd
2022-02-07 22:46                 ` Konstantin Ryabitsev
2022-02-08  6:23                   ` Gamblin, Todd
2022-02-07 22:49               ` Junio C Hamano
2022-02-07 23:02                 ` Konstantin Ryabitsev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=734DE4FA-E4B1-43CB-89EE-3C200FA21F4F@llnl.gov \
    --to=gamblin2@llnl.gov \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=j6t@kdbg.org \
    --cc=philipoakley@iee.email \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).