git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To: "brian m. carlson" <sandals@crustytoothpaste.net>
Cc: git@vger.kernel.org
Subject: Re: Submodules and SHA-256/SHA-1 interoperability
Date: Mon, 1 Mar 2021 20:28:13 +0100 (CET)	[thread overview]
Message-ID: <nycvar.QRO.7.76.6.2103011621000.57@tvgsbejvaqbjf.bet> (raw)
In-Reply-To: <YCmbKrTsJhgPHYLc@camp.crustytoothpaste.net>

Hi brian,

On Sun, 14 Feb 2021, brian m. carlson wrote:

> I'm currently working on the next step of the SHA-256 transition code,
> which is SHA-256/SHA-1 interoperability.  Essentially, when we write a
> loose object into the store, or when we index a pack, we take one form
> of the object, usually the SHA-256 form, and rewrite it so that it is in
> its SHA-1 form, and then hash it to determine its SHA-1 name.  We then
> write this correspondence either into the loose object index (for loose
> objects) or a v3 index (for packs).
>
> Blobs are simply hashed with both algorithms, but trees, commits, and
> tags need to be rewritten to use the SHA-1 names of the objects they
> refer to.  For most situations, we already have this data, since it will
> exist in the loose object index, in some pack index, or elsewhere in the
> pack we're indexing.
>
> However, for submodules, we have a problem.  By definition, the object
> exists in a different repository.  If we have the submodule locally on
> the system, then this works fine, but if we're performing a fetch or
> clone and the submodule is not present, then we cannot rewrite the tree
> or anything that refers to it, directly or indirectly.
>
> So there are some possible courses of action:
>
> * Disallow compatibility algorithms when using submodules.  This is
>   simple, but inconvenient.
> * Force users to always clone submodules and fetch them before fetching
>   the main repository.  This is also relatively simple, but
>   inconvenient.
> * Have the remote server keep a list of correspondences and send them in
>   a protocol extension.
> * Just skip rewriting objects until the data is filled in later and
>   admit the data will be incomplete.  This means that pushing to or
>   pulling from a repository using a incompatible algorithm will be
>   impossible.
> * Something else I haven't thought of.

While my strong urge is to add "Remove support for submodules" (which BTW
would also plug so many attack vectors that have lead to many a
vulnerability in the past), I understand that this would be impractical:
the figurative barn door has been open for way too long to do that.

But I'd like to put another idea into the fray: store the mapping in
`.gitmodules`. That is, each time `git submodule add <...>` is called, it
would update `.gitmodules` to list SHA-1 *and* SHA-256 for the given path.

That would relieve us of the problem where we rely on a server's ability
to give us that mapping.

Ciao,
Dscho

> The third option is where I'm leaning, but it has some potential
> downsides.  First, the server must support both hash algorithms and have
> this data.  Second, it essentially requires all submodule updates to be
> pushed from a compatible client.  Third, we need to trust that the
> server hasn't tampered with the data, which should be possible by doing
> an fsck on both forms (I think).  Fourth, we need to store this
> somewhere, and the only place we have right now is the loose object
> index, which would potentially grow to inefficient sizes.
>
> We could potentially change this to be slightly different by asking the
> submodule server for a list of correspondences instead via a new
> protocol extension, but it has the same downsides except for the second
> one, and additionally means that we'd need to make multiple connections.
>
> So I'm seeking some ideas on which approach we want to use here before
> I start sinking a lot of work into this.
> --
> brian m. carlson (he/him or they/them)
> Houston, Texas, US
>

  reply	other threads:[~2021-03-01 19:38 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-14 21:50 Submodules and SHA-256/SHA-1 interoperability brian m. carlson
2021-03-01 19:28 ` Johannes Schindelin [this message]
2021-03-13 19:42   ` brian m. carlson
2021-03-19 14:23     ` Johannes Schindelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=nycvar.QRO.7.76.6.2103011621000.57@tvgsbejvaqbjf.bet \
    --to=johannes.schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).