git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
From: Addison Klinke <addison@baller.tv>
To: git@vger.kernel.org
Cc: Addison Klinke <agk38@case.edu>
Subject: [FR] supporting submodules with alternate version control systems (new contributor)
Date: Tue, 10 May 2022 10:11:48 -0600	[thread overview]
Message-ID: <CAE9CXuhvqfhARrqz2=oS1=9BF=iNhGbJv7y3HmYs1tddn8ndiQ@mail.gmail.com> (raw)

Hello all,

I'm familiar with opensource software development through Github, but
have not contributed to git before so apologies if I'm using the wrong
avenues. Please point me in the right direction if that is the case. I
saw this mailing list mentioned on the
[mirror](https://github.com/git/git) repository, so it seemed like the
right place to start.

I have a feature request I'd like some feedback on. The core idea is
to support submodules with alternate (i.e. non-git based) version
control systems.

* **Why:** Git is excellent for versioning code and I don't need
another VCS for that purpose. However, in machine learning (ML)
workflows it has become more
[standard](https://opendatascience.com/how-data-versioning-can-be-used-in-machine-learning/)
to version your datasets, and for this purpose many git-like tools
have been developed. See [Dolt](https://www.dolthub.com/),
[LakeFS](https://lakefs.io/), and [DVC](https://dvc.org/) for a few
examples. Currently, ML practitioners have to bifurcate their
development process - code is committed/managed with git and datasets
are committed/managed with a 3rd party VCS (and often cloned in a
different folder outside the git repository). My proposal is to unify
the data versioning tools with git submodules so that they can act as
any other 3rd party library inside a parent repository

* **How:** Most data versioning tools already define a git-like CLI.
For instance, you have "dolt commit", "dvc push", "lakectl diff", etc.
The set of commands and options is usually a subset of the full list
available in git, but the important ones are there. My approach would
require a few steps

1. Git defines an API for configuring 3rd party VCS tools. It's
essentially a mapping from git command to the equivalent in the 3rd
party library. This should also account for which options/flags are
supported
2. Developers from the 3rd party library integrate with this git API
by maintaining a config file for the mapping that gets installed
alongside their binaries
3. The .gitmodules syntax is extended to include a "type" field which
defaults to git but can be set to other supported values
4. Then end-users can add submodules with an alternate VCS. Once
added, the CLI interaction would appear like normal git but under the
hood it would be using a different engine (and remote storage)

Is something along these lines feasible? If so, could someone who is
more familiar with the code base give me a rough idea how one might go
about this? I would like to author the PR to implement this - just
looking for some help getting started.

Thank you for the help,

Addison

             reply	other threads:[~2022-05-10 16:12 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-10 16:11 Addison Klinke [this message]
2022-05-10 17:00 ` Junio C Hamano
2022-05-10 17:20   ` Jason Pyeron
2022-05-10 17:26     ` Addison Klinke
2022-05-10 18:26       ` rsbecker
2022-05-10 20:54     ` Philip Oakley
2022-06-01 12:44       ` Addison Klinke
2022-06-03 23:06         ` Philip Oakley
2022-06-04  2:01           ` rsbecker
2022-06-04 13:27             ` Philip Oakley
2022-06-04 15:57               ` rsbecker
2022-06-05 21:52                 ` Philip Oakley
2022-06-06 14:53                   ` Addison Klinke

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAE9CXuhvqfhARrqz2=oS1=9BF=iNhGbJv7y3HmYs1tddn8ndiQ@mail.gmail.com' \
    --to=addison@baller.tv \
    --cc=agk38@case.edu \
    --cc=git@vger.kernel.org \
    --subject='Re: [FR] supporting submodules with alternate version control systems (new contributor)' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).