From: Junio C Hamano <gitster@pobox.com>
To: Duy Nguyen <pclouds@gmail.com>
Cc: Joey Hess <id@joeyh.name>, GIT Mailing List <git@vger.kernel.org>
Subject: Re: weaning distributions off tarballs: extended verification of git tags
Date: Mon, 02 Mar 2015 15:44:55 -0800 [thread overview]
Message-ID: <xmqqsidn7ymg.fsf@gitster.dls.corp.google.com> (raw)
In-Reply-To: <CACsJy8C3=f=esBrHE8OudSa0nUbCrLaYJtLC2in3p+tcc-d9bw@mail.gmail.com> (Duy Nguyen's message of "Tue, 3 Mar 2015 06:20:26 +0700")
Duy Nguyen <pclouds@gmail.com> writes:
> On Tue, Mar 3, 2015 at 1:12 AM, Joey Hess <id@joeyh.name> wrote:
>> I support this proposal, as someone who no longer releases tarballs
>> of my software, when I can possibly avoid it. I have worried about
>> signed tags / commits only being a SHA1 break away from useless.
>>
>> As to the implementation, checksumming the collection of raw objects is
>> certainly superior to tar. Colin had suggested sorting the objects by
>> checksum, but I don't think that is necessary. Just stream the commit
>> object, then its tree object, followed by the content of each object
>> listed in the tree, recursing into subtrees as necessary. That will be a
>> stable stream for a given commit, or tree.
>
> It could be simplified a bit by using ls-tree -r (so you basically
> have a single big tree). Then hash commit, ls-tree -r output and all
> blobs pointed by ls-tree in listed order.
What problem are you trying to solve here, though, by deliberately
deviating what Git internally used to store these objects? If it is
OK to ignore the tree boundary, then you probably do not even need
trees in this secondary hash for validation in the first place.
For example, you can hash a stream:
<commit object contents> +
N * (<pathname> + NUL + <blob object contents>)
as long as the <pathname>s are sorted in a predictable order (like
in "the index order") in the output. That would be even simpler (I
am not saying it is necessarily better, and by inference neither is
your "simplification").
I was about to suggest another alternative.
Pretend as if Git internally used SHA-512 (or whatever hash you
want to use) instead of SHA-1, compute the object names that
way. Recompute the contents of a tree object is by replacing
the 20-byte SHA-1 field in it with a field with whatever
necessary length to hold the longer object names of elements in
the tree.
But then a realization hit me: what new value will be placed in the
"parent " field in the commit object? You cannot have SHA-512
variant of commit object name without recomputing the whole history.
Now, if the final objective is to replace signature of tarballs,
does it matter to cover the commit object, or is it sufficient to
cover the tree contents?
Among the ideas raised so far, I like what Joey suggested, combined
with "each should have '<type> <length>NUL' header" from Sam Vilain
the best. That is, hash the stream:
"commit <length>" NUL + <commit object contents> +
"tree <length>" NUL + <top level tree contents> +
... list the entries in the order you would find by
... some defined traversal order people can agree on.
with whatever the preferred strong hash function of the age.
next prev parent reply other threads:[~2015-03-02 23:45 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-02-28 14:48 weaning distributions off tarballs: extended verification of git tags Colin Walters
2015-02-28 19:14 ` brian m. carlson
2015-02-28 20:34 ` Morten Welinder
2015-03-02 17:09 ` Colin Walters
2015-03-02 18:12 ` Joey Hess
2015-03-02 19:38 ` Sam Vilain
2015-03-02 20:08 ` Junio C Hamano
2015-03-02 20:52 ` Sam Vilain
2015-03-02 23:20 ` Duy Nguyen
2015-03-02 23:44 ` Junio C Hamano [this message]
2015-03-03 0:42 ` Duy Nguyen
2015-03-05 12:36 ` Michael Haggerty
2015-07-08 4:00 ` Colin Walters
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqsidn7ymg.fsf@gitster.dls.corp.google.com \
--to=gitster@pobox.com \
--cc=git@vger.kernel.org \
--cc=id@joeyh.name \
--cc=pclouds@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).