I'm currently working on the next step of the SHA-256 transition code,
which is SHA-256/SHA-1 interoperability.  Essentially, when we write a
loose object into the store, or when we index a pack, we take one form
of the object, usually the SHA-256 form, and rewrite it so that it is in
its SHA-1 form, and then hash it to determine its SHA-1 name.  We then
write this correspondence either into the loose object index (for loose
objects) or a v3 index (for packs).

Blobs are simply hashed with both algorithms, but trees, commits, and
tags need to be rewritten to use the SHA-1 names of the objects they
refer to.  For most situations, we already have this data, since it will
exist in the loose object index, in some pack index, or elsewhere in the
pack we're indexing.

However, for submodules, we have a problem.  By definition, the object
exists in a different repository.  If we have the submodule locally on
the system, then this works fine, but if we're performing a fetch or
clone and the submodule is not present, then we cannot rewrite the tree
or anything that refers to it, directly or indirectly.

So there are some possible courses of action:

* Disallow compatibility algorithms when using submodules.  This is
  simple, but inconvenient.
* Force users to always clone submodules and fetch them before fetching
  the main repository.  This is also relatively simple, but
  inconvenient.
* Have the remote server keep a list of correspondences and send them in
  a protocol extension.
* Just skip rewriting objects until the data is filled in later and
  admit the data will be incomplete.  This means that pushing to or
  pulling from a repository using a incompatible algorithm will be
  impossible.
* Something else I haven't thought of.

The third option is where I'm leaning, but it has some potential
downsides.  First, the server must support both hash algorithms and have
this data.  Second, it essentially requires all submodule updates to be
pushed from a compatible client.  Third, we need to trust that the
server hasn't tampered with the data, which should be possible by doing
an fsck on both forms (I think).  Fourth, we need to store this
somewhere, and the only place we have right now is the loose object
index, which would potentially grow to inefficient sizes.

We could potentially change this to be slightly different by asking the
submodule server for a list of correspondences instead via a new
protocol extension, but it has the same downsides except for the second
one, and additionally means that we'd need to make multiple connections.

So I'm seeking some ideas on which approach we want to use here before
I start sinking a lot of work into this.
-- 
brian m. carlson (he/him or they/them)
Houston, Texas, US