git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: dwh@linuxprogrammer.org
To: Junio C Hamano <gitster@pobox.com>
Cc: "brian m. carlson" <sandals@crustytoothpaste.net>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	git@vger.kernel.org
Subject: Re: Is the sha256 object format experimental or not?
Date: Thu, 13 May 2021 16:26:14 -0700	[thread overview]
Message-ID: <20210513232614.GF11882@localhost> (raw)
In-Reply-To: <xmqqo8de9wis.fsf@gitster.g>

On 14.05.2021 06:03, Junio C Hamano wrote:
>dwh@linuxprogrammer.org writes:
>
>> I think Git should externalize the calculation of object digests just
>> like it externalizes the calcualtion of object digital signatures.
>
>The hashing algorithms used to generate object names has
>requirements fundamentally different from that of digital
>signatures.  I strongly suspect that that fact would change the
>equation when you rethink what you said above.

I agree with you. Object names are exactly that: names. Names for
resources/data must be persistent, as well as global in scope and
uniqueness, and autonomously assigned. What this means is that once an
object has a name, that name shall never change as long as the object
remains unchanged. The names must be unique in the scope of all objects
(e.g. all copies of a repo) and generated without coordination.

Calculating object names using a digest algorithm meets all of these
requirements. Choosing a strong digest algorithm creates a strong
cryptographic binding between the name and the object contents. Using
self-describing digests allows for a repo to switch digest algorithms at
arbitrary points in the history.

I think that objects named with SHA1 digests should remain named with
the SHA1 digest. I do *not* advocate going back and rewriting history
to change all of the object names to a digest with a different
algorithm. Git is a provenance log and history matters. I recommend
preserving all existing names, even if they were created with known-weak
digest algorithms, and making the change to a new algorithm at a
specific point in time (e.g. at a tag). Using self-describing digest
encoding and externalizing digest calculation future-proofs
repositories and allows for preservation of history while allowing
algorithm agility.

To illustrate my point, I envision that a repos could have a history
like this:

object 2923f6fa36614586ea09b4424b438915cc1b9b67 (naked SHA1)
  |
<many objects named with SHA1>
  |
object 5f167fb6b3e96273b564fff0b041fb94fee4d3de (naked SHA1)
  |
<modify Git to ext. digest calculation and self-desc encoding>
  |
object 98c2e1c0965e60b0f137577ac5dd0a5c96ce224d (naked SHA1)
  |
<many objects named with SHA1>
  |
<a project decides to switch to SHA2-256, maybe marked in a tag>
  |
object IAOdLVxteOxQwKa-xn8yCBUkuPkjAqcuQ2V7fKAlao8o (self-desc.SHA2-256)
  |
<many objects named with self-describing SHA2-256 digests>
  |
<a project decices to switch to SHA3-256, maybe marked in a tag>
  |
object EK832G0PFhBFf-Dfgr205UKpUMqmVXJX9ltLwQo4Awct (self-desc.SHA3-256)
  |
<many objects named with self-descring SHA3-256 digests>
  .
  .
  .

Neither decision to switch to SHA2-256 nor to SHA3-256 would require any
code changes. If we continue down the current SHA-256 road, we will have
to repeat that multi-year effort in the future to switch to SHA3 or
something else. Most importantly, the choice of digest algorithm would
be left up to the maintainers of a given repo and not limited to the
algorithms we have hard coded into Git.

Brian's work on the SHA-256 switch is valuable. We can leverage a lot of
it to switch to externalized digest calculation and self-describing
digests and never have to worry about doing that again.

Cheers!
Dave

  reply	other threads:[~2021-05-13 23:26 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-08  2:22 Preserving the ability to have both SHA1 and SHA256 signatures dwh
2021-05-08  6:39 ` Christian Couder
2021-05-08  6:56   ` Junio C Hamano
2021-05-08  8:03     ` Felipe Contreras
2021-05-08 10:11       ` Stefan Moch
2021-05-08 11:12         ` Junio C Hamano
2021-05-09  0:19 ` brian m. carlson
2021-05-10 12:22   ` Is the sha256 object format experimental or not? Ævar Arnfjörð Bjarmason
2021-05-10 22:42     ` brian m. carlson
2021-05-13 20:29       ` dwh
2021-05-13 20:49         ` Konstantin Ryabitsev
2021-05-13 23:47           ` dwh
2021-05-14 13:45             ` Konstantin Ryabitsev
2021-05-14 17:39               ` dwh
2021-05-13 21:03         ` Junio C Hamano
2021-05-13 23:26           ` dwh [this message]
2021-05-14  8:49           ` Ævar Arnfjörð Bjarmason
2021-05-14 18:10             ` dwh
2021-05-18  5:32         ` Jonathan Nieder

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210513232614.GF11882@localhost \
    --to=dwh@linuxprogrammer.org \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).