list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
From: Addison Klinke <>
Cc: Jason Pyeron <>,
	Junio C Hamano <>,, Addison Klinke <>
Subject: Re: [FR] supporting submodules with alternate version control systems (new contributor)
Date: Wed, 1 Jun 2022 06:44:33 -0600	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

> rsbecker: move code into a submodule from your own VCS system
into a git repository and the work with the submodule without the git
code-base knowing about this

> Philip: uses a proper sub-module that within it then has
the single 'large' file git-lfs style that hosts the hash reference for
the data VCS

The downside I see with both of these approaches is that translating
the native data VCS to git (or LFS) negates all the benefits of having
a VCS purpose-built for data. That's why the majority of data
versioning tools exist - because git (or LFS) are not ideal for
handling machine learning datasets

On Tue, May 10, 2022 at 2:54 PM Philip Oakley <> wrote:
> On 10/05/2022 18:20, Jason Pyeron wrote:
> >> -----Original Message-----
> >> From: Junio C Hamano
> >> Sent: Tuesday, May 10, 2022 1:01 PM
> >> To: Addison Klinke <>
> >>
> >> Addison Klinke <> writes:
> >>
> >>> Is something along these lines feasible?
> >> Offhand, I only think of one thing that could make it fundamentally
> >> infeasible.
> >>
> >> When you bind an external repository (be it stored in Git or
> >> somebody else's system) as a submodule, each commit in the
> >> superproject records which exact commit in the submodule is used
> >> with the rest of the superproject tree.  And that is done by
> >> recording the object name of the commit in the submodule.
> >>
> >> What it means for the foreign system that wants to "plug into" a
> >> superproject in Git as a submodule?  It is required to do two
> >> things:
> >>
> >>   * At the time "git commit" is run at the superproject level, the
> >>     foreign system has to be able to say "the version I have to be
> >>     used in the context of this superproject commit is X", with X
> >>     that somehow can be stored in the superproject's tree object
> >>     (which is sized 20-byte for SHA-1 repositories; in SHA-256
> >>     repositories, it is a bit wider).
> >>
> >>   * At the time "git chekcout" is run at the superproject level, the
> >>     superproject will learn the above X (i.e. the version of the
> >>     submodule that goes with the version of the superproject being
> >>     checked out).  The foreign system has to be able to perform a
> >>     "checkout" given that X.
> >>
> >> If a foreign system cannot do the above two, then it fundamentally
> >> would be incapable of participating in such a "superproject and
> >> submodule" relationship.
> The sub-modules already have that problem if the user forgets publish
> their sub-module (see notes in the docs ;-).
> > The submodule "type" could create an object (hashed and stored) that contains the needed "translation" details. The object would be hashed using SHA1 or SHA256 depending on the git config. The format of the object's contents would be defined by the submodule's "code".
> >
> Another way of looking at the issue is via a variant of Git-LFS with a
> smudge/clean style filter. I.e. the DataVCS would be treated as a 'file'.
> The LFS already uses the .gitattributes to define a 'type', while the
> submodules don't yet have that capability. There is just a single
> special type within a tree object of "sub-module"  being a mode 16000
> commit (see
> One thought is that one uses a proper sub-module that within it then has
> the single 'large' file git-lfs style that hosts the hash reference for
> the data VCS
> ( It would be
> the regular sub-modules .gitattributes file that handles the data
> conversion.
> It may be converting an X-Y problem into an X-Y-Z solution, or just
> extending the problem.
> --
> Philip

  reply	other threads:[~2022-06-01 12:45 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-10 16:11 Addison Klinke
2022-05-10 17:00 ` Junio C Hamano
2022-05-10 17:20   ` Jason Pyeron
2022-05-10 17:26     ` Addison Klinke
2022-05-10 18:26       ` rsbecker
2022-05-10 20:54     ` Philip Oakley
2022-06-01 12:44       ` Addison Klinke [this message]
2022-06-03 23:06         ` Philip Oakley
2022-06-04  2:01           ` rsbecker
2022-06-04 13:27             ` Philip Oakley
2022-06-04 15:57               ` rsbecker
2022-06-05 21:52                 ` Philip Oakley
2022-06-06 14:53                   ` Addison Klinke

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

  List information:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this inbox:

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).