From: Junio C Hamano <gitster@pobox.com>
To: Jonathan Nieder <jrnieder@gmail.com>
Cc: Shawn Pearce <spearce@spearce.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Git Mailing List <git@vger.kernel.org>,
Stefan Beller <sbeller@google.com>,
bmwill@google.com, Jonathan Tan <jonathantanmy@google.com>,
Jeff King <peff@peff.net>, David Lang <david@lang.hm>,
"brian m. carlson" <sandals@crustytoothpaste.net>,
Masaya Suzuki <masayasuzuki@google.com>,
demerphq@gmail.com, The Keccak Team <keccak@noekeon.org>,
Johannes Schindelin <Johannes.Schindelin@gmx.de>
Subject: Re: [PATCH v4] technical doc: add a design doc for hash function transition
Date: Mon, 02 Oct 2017 17:25:15 +0900 [thread overview]
Message-ID: <xmqq3772ot1w.fsf@gitster.mtv.corp.google.com> (raw)
In-Reply-To: <20170929173413.GI19555@aiede.mtv.corp.google.com> (Jonathan Nieder's message of "Fri, 29 Sep 2017 10:34:13 -0700")
Jonathan Nieder <jrnieder@gmail.com> writes:
>>> +6. Skip fetching some submodules of a project into a NewHash
>>> + repository. (This also depends on NewHash support in Git
>>> + protocol.)
>>
>> It is unclear what this means. Around submodule support, one thing
>> I can think of is that a NewHash tree in a superproject would record
>> a gitlink that is a NewHash commit object name in it, therefore it
>> cannot refer to an unconverted SHA-1 submodule repository. But it
>> is unclear if the above description refers to the same issue, or
>> something else.
>
> It refers to that issue.
We may want to find a way to make it clear, then.
>> It makes me wonder if we want to add the hashname in this object
>> header. "length" would be different for non-blob objects anyway,
>> and it is not "compat metadata" we want to avoid baked in, yet it
>> would help diagnose a mistake of attempting to use a "mixed" objects
>> in a single repository. Not a big issue, though.
>
> Do you mean that adding the hashname into the computation that
> produces the object name would help in some use case?
What I mean is that for SHA-1 objects we keep the object header to
be "<type> <length> NUL". For objects in newer world, use the
object header to "<type> <hash> <length> NUL", and include the
hashname in the object name computation.
> For loose objects, it would be nice to name the hash in the file, so
> that "file" can understand what is happening if someone accidentally
> mixes types using "cp". The only downside is losing the ability to
> copy blobs (which have the same content despite being named using
> different hashes) between repositories after determining their new
> names. That doesn't seem like a strong downside --- it's pretty
> harmless to include the hash type in loose object files, too. I think
> I would prefer this to be a "magic number" instead of part of the
> zlib-deflated payload, since this way "file" can discover it more
> easily.
Yeah, thanks for doing pros-and-cons for me ;-)
>> If it is a goal to eventually be able to lose SHA-1 compatibility
>> metadata from the objects, then we might want to remove SHA-1 based
>> signature bits (e.g. PGP trailer in signed tag, gpgsig header in the
>> commit object) from NewHash contents, and instead have them stored
>> in a side "metadata" table, only to be used while converting back.
>> I dunno if that is desirable.
>
> I don't consider that desirable.
Agreed. Let's not go there.
>> Hmm, as the corresponding packfile stores object data only in
>> NewHash content format, it is somewhat curious that this table that
>> stores CRC32 of the data appears in the "Tables for each object
>> format" section, as they would be identical, no? Unless I am
>> grossly misleading the spec, the checksum should either go outside
>> the "Tables for each object format" section but still in .idx, or
>> should be eliminated and become part of the packdata stream instead,
>> perhaps?
>
> It's actually only present for the first object format. Will find a
> better way to describe this.
I see. One way to do so is to have it upfront before the "after
this point, these tables repeat for each of the hashes" part of the
file.
>> Oy. So we can go from a short prefix to the pack location by first
>> finding it via binsearch in the short-name table, realize that it is
>> nth object in the object name order, and consulting this table.
>> When we know the pack-order of an object, there is no direct way to
>> go to its location (short of reversing the name-order-to-pack-order
>> table)?
>
> An earlier version of the design also had a pack-order-to-pack-offset
> table, but we weren't able to think of any cases where that would be
> used without also looking up the object name that can be used to
> verify the integrity of the inflated object.
The primary thing I was interested in knowing was if we tried to
think of any case where it may be useful and then didn't think of
any---I couldn't but I know I am not imaginative enough, and I
wanted to know you guys didn't, either.
next prev parent reply other threads:[~2017-10-02 8:25 UTC|newest]
Thread overview: 113+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-03-04 1:12 RFC: Another proposed hash function transition plan Jonathan Nieder
2017-03-05 2:35 ` Linus Torvalds
2017-03-06 0:26 ` brian m. carlson
2017-03-06 18:24 ` Brandon Williams
2017-06-15 10:30 ` Which hash function to use, was " Johannes Schindelin
2017-06-15 11:05 ` Mike Hommey
2017-06-15 13:01 ` Jeff King
2017-06-15 16:30 ` Ævar Arnfjörð Bjarmason
2017-06-15 19:34 ` Johannes Schindelin
2017-06-15 21:59 ` Adam Langley
2017-06-15 22:41 ` brian m. carlson
2017-06-15 23:36 ` Ævar Arnfjörð Bjarmason
2017-06-16 0:17 ` brian m. carlson
2017-06-16 6:25 ` Ævar Arnfjörð Bjarmason
2017-06-16 13:24 ` Johannes Schindelin
2017-06-16 17:38 ` Adam Langley
2017-06-16 20:52 ` Junio C Hamano
2017-06-16 21:12 ` Junio C Hamano
2017-06-16 21:24 ` Jonathan Nieder
2017-06-16 21:39 ` Ævar Arnfjörð Bjarmason
2017-06-16 20:42 ` Jeff King
2017-06-19 9:26 ` Johannes Schindelin
2017-06-15 21:10 ` Mike Hommey
2017-06-16 4:30 ` Jeff King
2017-06-15 17:36 ` Brandon Williams
2017-06-15 19:20 ` Junio C Hamano
2017-06-15 19:13 ` Jonathan Nieder
2017-03-07 0:17 ` RFC v3: " Jonathan Nieder
2017-03-09 19:14 ` Shawn Pearce
2017-03-09 20:24 ` Jonathan Nieder
2017-03-10 19:38 ` Jeff King
2017-03-10 19:55 ` Jonathan Nieder
2017-09-28 4:43 ` [PATCH v4] technical doc: add a design doc for hash function transition Jonathan Nieder
2017-09-29 6:06 ` Junio C Hamano
2017-09-29 8:09 ` Junio C Hamano
2017-09-29 17:34 ` Jonathan Nieder
2017-10-02 8:25 ` Junio C Hamano [this message]
2017-10-02 19:41 ` Jason Cooper
2017-10-02 9:02 ` Junio C Hamano
2017-10-02 19:23 ` Jason Cooper
2017-10-03 5:40 ` Junio C Hamano
2017-10-03 13:08 ` Jason Cooper
2017-10-04 1:44 ` Junio C Hamano
2017-09-06 6:28 ` RFC v3: Another proposed hash function transition plan Junio C Hamano
2017-09-08 2:40 ` Junio C Hamano
2017-09-08 3:34 ` Jeff King
2017-09-11 18:59 ` Brandon Williams
2017-09-13 12:05 ` Johannes Schindelin
2017-09-13 13:43 ` demerphq
2017-09-13 22:51 ` Jonathan Nieder
2017-09-14 18:26 ` Johannes Schindelin
2017-09-14 18:40 ` Jonathan Nieder
2017-09-14 22:09 ` Johannes Schindelin
2017-09-13 23:30 ` Linus Torvalds
2017-09-14 18:45 ` Johannes Schindelin
2017-09-18 12:17 ` Gilles Van Assche
2017-09-18 22:16 ` Johannes Schindelin
2017-09-19 16:45 ` Gilles Van Assche
2017-09-29 13:17 ` Johannes Schindelin
2017-09-29 14:54 ` Joan Daemen
2017-09-29 22:33 ` Johannes Schindelin
2017-09-30 22:02 ` Joan Daemen
2017-10-02 14:26 ` Johannes Schindelin
2017-09-18 22:25 ` Jonathan Nieder
2017-09-26 17:05 ` Jason Cooper
2017-09-26 22:11 ` Johannes Schindelin
2017-09-26 22:25 ` [PATCH] technical doc: add a design doc for hash function transition Stefan Beller
2017-09-26 23:38 ` Jonathan Nieder
2017-09-26 23:51 ` RFC v3: Another proposed hash function transition plan Jonathan Nieder
2017-10-02 14:54 ` Jason Cooper
2017-10-02 16:50 ` Brandon Williams
2017-10-02 14:00 ` Jason Cooper
2017-10-02 17:18 ` Linus Torvalds
2017-10-02 19:37 ` Jeff King
2017-09-13 16:30 ` Jonathan Nieder
2017-09-13 21:52 ` Junio C Hamano
2017-09-13 22:07 ` Stefan Beller
2017-09-13 22:18 ` Jonathan Nieder
2017-09-14 2:13 ` Junio C Hamano
2017-09-14 15:23 ` Johannes Schindelin
2017-09-14 15:45 ` demerphq
2017-09-14 22:06 ` Johannes Schindelin
2017-09-13 22:15 ` Junio C Hamano
2017-09-13 22:27 ` Jonathan Nieder
2017-09-14 2:10 ` Junio C Hamano
2017-09-14 12:39 ` Johannes Schindelin
2017-09-14 16:36 ` Brandon Williams
2017-09-14 18:49 ` Jonathan Nieder
2017-09-15 20:42 ` Philip Oakley
2017-03-05 11:02 ` RFC: " David Lang
[not found] ` <CA+dhYEXHbQfJ6KUB1tWS9u1MLEOJL81fTYkbxu4XO-i+379LPw@mail.gmail.com>
2017-03-06 9:43 ` Jeff King
2017-03-06 23:40 ` Jonathan Nieder
2017-03-07 0:03 ` Mike Hommey
2017-03-06 8:43 ` Jeff King
2017-03-06 18:39 ` Jonathan Tan
2017-03-06 19:22 ` Linus Torvalds
2017-03-06 19:59 ` Brandon Williams
2017-03-06 21:53 ` Junio C Hamano
2017-03-07 8:59 ` Jeff King
2017-03-06 18:43 ` Junio C Hamano
2017-03-07 18:57 ` Ian Jackson
2017-03-07 19:15 ` Linus Torvalds
2017-03-08 11:20 ` Ian Jackson
2017-03-08 15:37 ` Johannes Schindelin
2017-03-08 15:40 ` Johannes Schindelin
2017-03-20 5:21 ` Use base32? Jason Hennessey
2017-03-20 5:58 ` Michael Steuer
2017-03-20 8:05 ` Jacob Keller
2017-03-21 3:07 ` Michael Steuer
2017-03-13 9:24 ` RFC: Another proposed hash function transition plan The Keccak Team
2017-03-13 17:48 ` Jonathan Nieder
2017-03-13 18:34 ` ankostis
2017-03-17 11:07 ` Johannes Schindelin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqq3772ot1w.fsf@gitster.mtv.corp.google.com \
--to=gitster@pobox.com \
--cc=Johannes.Schindelin@gmx.de \
--cc=bmwill@google.com \
--cc=david@lang.hm \
--cc=demerphq@gmail.com \
--cc=git@vger.kernel.org \
--cc=jonathantanmy@google.com \
--cc=jrnieder@gmail.com \
--cc=keccak@noekeon.org \
--cc=masayasuzuki@google.com \
--cc=peff@peff.net \
--cc=sandals@crustytoothpaste.net \
--cc=sbeller@google.com \
--cc=spearce@spearce.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).