git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [FR] supporting submodules with alternate version control systems (new contributor)
@ 2022-05-10 16:11 Addison Klinke
  2022-05-10 17:00 ` Junio C Hamano
  0 siblings, 1 reply; 13+ messages in thread
From: Addison Klinke @ 2022-05-10 16:11 UTC (permalink / raw)
  To: git; +Cc: Addison Klinke

Hello all,

I'm familiar with opensource software development through Github, but
have not contributed to git before so apologies if I'm using the wrong
avenues. Please point me in the right direction if that is the case. I
saw this mailing list mentioned on the
[mirror](https://github.com/git/git) repository, so it seemed like the
right place to start.

I have a feature request I'd like some feedback on. The core idea is
to support submodules with alternate (i.e. non-git based) version
control systems.

* **Why:** Git is excellent for versioning code and I don't need
another VCS for that purpose. However, in machine learning (ML)
workflows it has become more
[standard](https://opendatascience.com/how-data-versioning-can-be-used-in-machine-learning/)
to version your datasets, and for this purpose many git-like tools
have been developed. See [Dolt](https://www.dolthub.com/),
[LakeFS](https://lakefs.io/), and [DVC](https://dvc.org/) for a few
examples. Currently, ML practitioners have to bifurcate their
development process - code is committed/managed with git and datasets
are committed/managed with a 3rd party VCS (and often cloned in a
different folder outside the git repository). My proposal is to unify
the data versioning tools with git submodules so that they can act as
any other 3rd party library inside a parent repository

* **How:** Most data versioning tools already define a git-like CLI.
For instance, you have "dolt commit", "dvc push", "lakectl diff", etc.
The set of commands and options is usually a subset of the full list
available in git, but the important ones are there. My approach would
require a few steps

1. Git defines an API for configuring 3rd party VCS tools. It's
essentially a mapping from git command to the equivalent in the 3rd
party library. This should also account for which options/flags are
supported
2. Developers from the 3rd party library integrate with this git API
by maintaining a config file for the mapping that gets installed
alongside their binaries
3. The .gitmodules syntax is extended to include a "type" field which
defaults to git but can be set to other supported values
4. Then end-users can add submodules with an alternate VCS. Once
added, the CLI interaction would appear like normal git but under the
hood it would be using a different engine (and remote storage)

Is something along these lines feasible? If so, could someone who is
more familiar with the code base give me a rough idea how one might go
about this? I would like to author the PR to implement this - just
looking for some help getting started.

Thank you for the help,

Addison

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-06-06 14:53 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-10 16:11 [FR] supporting submodules with alternate version control systems (new contributor) Addison Klinke
2022-05-10 17:00 ` Junio C Hamano
2022-05-10 17:20   ` Jason Pyeron
2022-05-10 17:26     ` Addison Klinke
2022-05-10 18:26       ` rsbecker
2022-05-10 20:54     ` Philip Oakley
2022-06-01 12:44       ` Addison Klinke
2022-06-03 23:06         ` Philip Oakley
2022-06-04  2:01           ` rsbecker
2022-06-04 13:27             ` Philip Oakley
2022-06-04 15:57               ` rsbecker
2022-06-05 21:52                 ` Philip Oakley
2022-06-06 14:53                   ` Addison Klinke

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).