git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Duy Nguyen <pclouds@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: Joey Hess <id@joeyh.name>, GIT Mailing List <git@vger.kernel.org>
Subject: Re: weaning distributions off tarballs: extended verification of git tags
Date: Tue, 3 Mar 2015 07:42:09 +0700	[thread overview]
Message-ID: <CACsJy8ALQ=Hs2vnpiNxbp-n_sZvNahhtE4N2H-4_Jma4yo6rVQ@mail.gmail.com> (raw)
In-Reply-To: <xmqqsidn7ymg.fsf@gitster.dls.corp.google.com>

On Tue, Mar 3, 2015 at 6:44 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Duy Nguyen <pclouds@gmail.com> writes:
>
>> On Tue, Mar 3, 2015 at 1:12 AM, Joey Hess <id@joeyh.name> wrote:
>>> I support this proposal, as someone who no longer releases tarballs
>>> of my software, when I can possibly avoid it. I have worried about
>>> signed tags / commits only being a SHA1 break away from useless.
>>>
>>> As to the implementation, checksumming the collection of raw objects is
>>> certainly superior to tar. Colin had suggested sorting the objects by
>>> checksum, but I don't think that is necessary. Just stream the commit
>>> object, then its tree object, followed by the content of each object
>>> listed in the tree, recursing into subtrees as necessary. That will be a
>>> stable stream for a given commit, or tree.
>>
>> It could be simplified a bit by using ls-tree -r (so you basically
>> have a single big tree). Then hash commit, ls-tree -r output and all
>> blobs pointed by ls-tree in listed order.
>
> What problem are you trying to solve here, though, by deliberately
> deviating what Git internally used to store these objects?  If it is
> OK to ignore the tree boundary, then you probably do not even need
> trees in this secondary hash for validation in the first place.
>
> For example, you can hash a stream:
>
>     <commit object contents> +
>     N * (<pathname> + NUL + <blob object contents>)
>
> as long as the <pathname>s are sorted in a predictable order (like
> in "the index order") in the output.  That would be even simpler (I
> am not saying it is necessarily better, and by inference neither is
> your "simplification").

I did nearly that [1]. But this morning I realized trees carry file
permission. We should keep that in the final checksum as well.

> Now, if the final objective is to replace signature of tarballs,
> does it matter to cover the commit object, or is it sufficient to
> cover the tree contents?
>
> Among the ideas raised so far, I like what Joey suggested, combined
> with "each should have '<type> <length>NUL' header" from Sam Vilain
> the best.  That is, hash the stream:
>
>     "commit <length>" NUL + <commit object contents> +
>     "tree <length>" NUL + <top level tree contents> +
>     ... list the entries in the order you would find by
>     ... some defined traversal order people can agree on.
>
> with whatever the preferred strong hash function of the age.

A bit harder to script, but simpler to provide from cat-file, I think.

[1] http://article.gmane.org/gmane.comp.version-control.git/260211
-- 
Duy

  reply	other threads:[~2015-03-03  0:42 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-28 14:48 weaning distributions off tarballs: extended verification of git tags Colin Walters
2015-02-28 19:14 ` brian m. carlson
2015-02-28 20:34 ` Morten Welinder
2015-03-02 17:09   ` Colin Walters
2015-03-02 18:12     ` Joey Hess
2015-03-02 19:38       ` Sam Vilain
2015-03-02 20:08         ` Junio C Hamano
2015-03-02 20:52           ` Sam Vilain
2015-03-02 23:20       ` Duy Nguyen
2015-03-02 23:44         ` Junio C Hamano
2015-03-03  0:42           ` Duy Nguyen [this message]
2015-03-05 12:36           ` Michael Haggerty
2015-07-08  4:00 ` Colin Walters

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CACsJy8ALQ=Hs2vnpiNxbp-n_sZvNahhtE4N2H-4_Jma4yo6rVQ@mail.gmail.com' \
    --to=pclouds@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=id@joeyh.name \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).