git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "brian m. carlson" <sandals@crustytoothpaste.net>
To: git@vger.kernel.org
Subject: Submodules and SHA-256/SHA-1 interoperability
Date: Sun, 14 Feb 2021 21:50:34 +0000	[thread overview]
Message-ID: <YCmbKrTsJhgPHYLc@camp.crustytoothpaste.net> (raw)

[-- Attachment #1: Type: text/plain, Size: 2808 bytes --]

I'm currently working on the next step of the SHA-256 transition code,
which is SHA-256/SHA-1 interoperability.  Essentially, when we write a
loose object into the store, or when we index a pack, we take one form
of the object, usually the SHA-256 form, and rewrite it so that it is in
its SHA-1 form, and then hash it to determine its SHA-1 name.  We then
write this correspondence either into the loose object index (for loose
objects) or a v3 index (for packs).

Blobs are simply hashed with both algorithms, but trees, commits, and
tags need to be rewritten to use the SHA-1 names of the objects they
refer to.  For most situations, we already have this data, since it will
exist in the loose object index, in some pack index, or elsewhere in the
pack we're indexing.

However, for submodules, we have a problem.  By definition, the object
exists in a different repository.  If we have the submodule locally on
the system, then this works fine, but if we're performing a fetch or
clone and the submodule is not present, then we cannot rewrite the tree
or anything that refers to it, directly or indirectly.

So there are some possible courses of action:

* Disallow compatibility algorithms when using submodules.  This is
  simple, but inconvenient.
* Force users to always clone submodules and fetch them before fetching
  the main repository.  This is also relatively simple, but
  inconvenient.
* Have the remote server keep a list of correspondences and send them in
  a protocol extension.
* Just skip rewriting objects until the data is filled in later and
  admit the data will be incomplete.  This means that pushing to or
  pulling from a repository using a incompatible algorithm will be
  impossible.
* Something else I haven't thought of.

The third option is where I'm leaning, but it has some potential
downsides.  First, the server must support both hash algorithms and have
this data.  Second, it essentially requires all submodule updates to be
pushed from a compatible client.  Third, we need to trust that the
server hasn't tampered with the data, which should be possible by doing
an fsck on both forms (I think).  Fourth, we need to store this
somewhere, and the only place we have right now is the loose object
index, which would potentially grow to inefficient sizes.

We could potentially change this to be slightly different by asking the
submodule server for a list of correspondences instead via a new
protocol extension, but it has the same downsides except for the second
one, and additionally means that we'd need to make multiple connections.

So I'm seeking some ideas on which approach we want to use here before
I start sinking a lot of work into this.
-- 
brian m. carlson (he/him or they/them)
Houston, Texas, US

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

             reply	other threads:[~2021-02-14 21:54 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-14 21:50 brian m. carlson [this message]
2021-03-01 19:28 ` Submodules and SHA-256/SHA-1 interoperability Johannes Schindelin
2021-03-13 19:42   ` brian m. carlson
2021-03-19 14:23     ` Johannes Schindelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YCmbKrTsJhgPHYLc@camp.crustytoothpaste.net \
    --to=sandals@crustytoothpaste.net \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).