From: Addison Klinke <addison@baller.tv> To: git@vger.kernel.org Cc: Addison Klinke <agk38@case.edu> Subject: [FR] supporting submodules with alternate version control systems (new contributor) Date: Tue, 10 May 2022 10:11:48 -0600 [thread overview] Message-ID: <CAE9CXuhvqfhARrqz2=oS1=9BF=iNhGbJv7y3HmYs1tddn8ndiQ@mail.gmail.com> (raw) Hello all, I'm familiar with opensource software development through Github, but have not contributed to git before so apologies if I'm using the wrong avenues. Please point me in the right direction if that is the case. I saw this mailing list mentioned on the [mirror](https://github.com/git/git) repository, so it seemed like the right place to start. I have a feature request I'd like some feedback on. The core idea is to support submodules with alternate (i.e. non-git based) version control systems. * **Why:** Git is excellent for versioning code and I don't need another VCS for that purpose. However, in machine learning (ML) workflows it has become more [standard](https://opendatascience.com/how-data-versioning-can-be-used-in-machine-learning/) to version your datasets, and for this purpose many git-like tools have been developed. See [Dolt](https://www.dolthub.com/), [LakeFS](https://lakefs.io/), and [DVC](https://dvc.org/) for a few examples. Currently, ML practitioners have to bifurcate their development process - code is committed/managed with git and datasets are committed/managed with a 3rd party VCS (and often cloned in a different folder outside the git repository). My proposal is to unify the data versioning tools with git submodules so that they can act as any other 3rd party library inside a parent repository * **How:** Most data versioning tools already define a git-like CLI. For instance, you have "dolt commit", "dvc push", "lakectl diff", etc. The set of commands and options is usually a subset of the full list available in git, but the important ones are there. My approach would require a few steps 1. Git defines an API for configuring 3rd party VCS tools. It's essentially a mapping from git command to the equivalent in the 3rd party library. This should also account for which options/flags are supported 2. Developers from the 3rd party library integrate with this git API by maintaining a config file for the mapping that gets installed alongside their binaries 3. The .gitmodules syntax is extended to include a "type" field which defaults to git but can be set to other supported values 4. Then end-users can add submodules with an alternate VCS. Once added, the CLI interaction would appear like normal git but under the hood it would be using a different engine (and remote storage) Is something along these lines feasible? If so, could someone who is more familiar with the code base give me a rough idea how one might go about this? I would like to author the PR to implement this - just looking for some help getting started. Thank you for the help, Addison
next reply other threads:[~2022-05-10 16:12 UTC|newest] Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top 2022-05-10 16:11 Addison Klinke [this message] 2022-05-10 17:00 ` Junio C Hamano 2022-05-10 17:20 ` Jason Pyeron 2022-05-10 17:26 ` Addison Klinke 2022-05-10 18:26 ` rsbecker 2022-05-10 20:54 ` Philip Oakley 2022-06-01 12:44 ` Addison Klinke 2022-06-03 23:06 ` Philip Oakley 2022-06-04 2:01 ` rsbecker 2022-06-04 13:27 ` Philip Oakley 2022-06-04 15:57 ` rsbecker 2022-06-05 21:52 ` Philip Oakley 2022-06-06 14:53 ` Addison Klinke
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style List information: http://vger.kernel.org/majordomo-info.html * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to='CAE9CXuhvqfhARrqz2=oS1=9BF=iNhGbJv7y3HmYs1tddn8ndiQ@mail.gmail.com' \ --to=addison@baller.tv \ --cc=agk38@case.edu \ --cc=git@vger.kernel.org \ --subject='Re: [FR] supporting submodules with alternate version control systems (new contributor)' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Code repositories for project(s) associated with this inbox: https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).