git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Duy Nguyen <pclouds@gmail.com>
Cc: Joey Hess <id@joeyh.name>, GIT Mailing List <git@vger.kernel.org>
Subject: Re: weaning distributions off tarballs: extended verification of git tags
Date: Mon, 02 Mar 2015 15:44:55 -0800	[thread overview]
Message-ID: <xmqqsidn7ymg.fsf@gitster.dls.corp.google.com> (raw)
In-Reply-To: <CACsJy8C3=f=esBrHE8OudSa0nUbCrLaYJtLC2in3p+tcc-d9bw@mail.gmail.com> (Duy Nguyen's message of "Tue, 3 Mar 2015 06:20:26 +0700")

Duy Nguyen <pclouds@gmail.com> writes:

> On Tue, Mar 3, 2015 at 1:12 AM, Joey Hess <id@joeyh.name> wrote:
>> I support this proposal, as someone who no longer releases tarballs
>> of my software, when I can possibly avoid it. I have worried about
>> signed tags / commits only being a SHA1 break away from useless.
>>
>> As to the implementation, checksumming the collection of raw objects is
>> certainly superior to tar. Colin had suggested sorting the objects by
>> checksum, but I don't think that is necessary. Just stream the commit
>> object, then its tree object, followed by the content of each object
>> listed in the tree, recursing into subtrees as necessary. That will be a
>> stable stream for a given commit, or tree.
>
> It could be simplified a bit by using ls-tree -r (so you basically
> have a single big tree). Then hash commit, ls-tree -r output and all
> blobs pointed by ls-tree in listed order.

What problem are you trying to solve here, though, by deliberately
deviating what Git internally used to store these objects?  If it is
OK to ignore the tree boundary, then you probably do not even need
trees in this secondary hash for validation in the first place.

For example, you can hash a stream:

    <commit object contents> +
    N * (<pathname> + NUL + <blob object contents>)

as long as the <pathname>s are sorted in a predictable order (like
in "the index order") in the output.  That would be even simpler (I
am not saying it is necessarily better, and by inference neither is
your "simplification").

I was about to suggest another alternative.

    Pretend as if Git internally used SHA-512 (or whatever hash you
    want to use) instead of SHA-1, compute the object names that
    way.  Recompute the contents of a tree object is by replacing
    the 20-byte SHA-1 field in it with a field with whatever
    necessary length to hold the longer object names of elements in
    the tree.

But then a realization hit me: what new value will be placed in the
"parent " field in the commit object?  You cannot have SHA-512
variant of commit object name without recomputing the whole history.

Now, if the final objective is to replace signature of tarballs,
does it matter to cover the commit object, or is it sufficient to
cover the tree contents?

Among the ideas raised so far, I like what Joey suggested, combined
with "each should have '<type> <length>NUL' header" from Sam Vilain
the best.  That is, hash the stream:

    "commit <length>" NUL + <commit object contents> +
    "tree <length>" NUL + <top level tree contents> +
    ... list the entries in the order you would find by
    ... some defined traversal order people can agree on.

with whatever the preferred strong hash function of the age.

  reply	other threads:[~2015-03-02 23:45 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-28 14:48 weaning distributions off tarballs: extended verification of git tags Colin Walters
2015-02-28 19:14 ` brian m. carlson
2015-02-28 20:34 ` Morten Welinder
2015-03-02 17:09   ` Colin Walters
2015-03-02 18:12     ` Joey Hess
2015-03-02 19:38       ` Sam Vilain
2015-03-02 20:08         ` Junio C Hamano
2015-03-02 20:52           ` Sam Vilain
2015-03-02 23:20       ` Duy Nguyen
2015-03-02 23:44         ` Junio C Hamano [this message]
2015-03-03  0:42           ` Duy Nguyen
2015-03-05 12:36           ` Michael Haggerty
2015-07-08  4:00 ` Colin Walters

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqsidn7ymg.fsf@gitster.dls.corp.google.com \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=id@joeyh.name \
    --cc=pclouds@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).