From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-Status: No, score=-3.4 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, FREEMAIL_REPLYTO_END_DIGIT,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by dcvr.yhbt.net (Postfix) with ESMTP id 09B131F66E for ; Thu, 20 Aug 2020 08:03:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725977AbgHTIDP (ORCPT ); Thu, 20 Aug 2020 04:03:15 -0400 Received: from mail-40141.protonmail.ch ([185.70.40.141]:54760 "EHLO mail-40141.protonmail.ch" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725834AbgHTIDM (ORCPT ); Thu, 20 Aug 2020 04:03:12 -0400 Date: Thu, 20 Aug 2020 08:03:02 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=protonmail.com; s=protonmail; t=1597910587; bh=sTAQYfaYpfYaqL23/gb6xcbE0LtFmCdCUQ7HbMzI/bY=; h=Date:To:From:Reply-To:Subject:From; b=qtS2f4ebfnkCEFJ5MIRzry4RMJmbmXdBAsPeOF3xTC0vYrH103TXgtk8QD9SUDGH7 BDsh+R/NtW7gDgZ9WTuirZO7Uusv0pFtOHurvFd/BaSE0Bpm3lTWHBVXfDW3b1X5Rr daO8w7lIIC2WYAqBmXsWfKE6ODeRXbQYQA/VZ9S4= To: "git@vger.kernel.org" From: Mickey Endito Reply-To: Mickey Endito Subject: Cloning subfolder as new root; subfolder as worktree Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Dear all, I'm currently missing a feature in git to be able to clone a subfolder as t= he root or phrased differently use a subfolder as a worktree. This mail has become rather long, so here is an overview: First, I state th= e problem and give some use cases. Then, I list a couple of partial workaroun= ds and other methods for the problem, which are currently achievable with git. Lastly, I provide an idea how this feature could be implemented within git object model. The Problem ----------- Assume you have a git repository A with the following file structure: foo/a.txt bar/b.txt bar/baz/c.txt What I want to achieve is creating a git repo where bar/ equals the root /, i.e. a repo B with the contents b.txt baz/c.txt We can describe that as a lens or zoom of the repo A. I believe svn had that capability but I'm not sure. Applications for that feature ----------------------------- Notation: I use "repo:path" to indicate that path should be seen relative t= o the repository repo. * I have a repo which mimics (parts of) my entire file system (think configuration files). I'd like to be able to check out the subfolder repo:/etc/foobar in the actual filesystem folder /etc. While not checking= out the rest of repo:/etc as that would lead in a disaster. * I have a website project where the html files (which are not generated by a build script) are in repo:www/ and I'd like to check them out to /srv/w= ww/ for deployment. * Think of a big project with different components possibly stacked deep in a directory structure. We want to work on a single component somewhere do= wn that structure, e.g. repo:client/new/x11/gtk/daemon/test/testapp. We coul= d use git-sparse-checkout for that but that would leave us with a lot of quasi-empty directories client/new/x11/gtk/daemon/test/ where in this cas= e only the testapp directory is relevant. Things/Workarounds I am aware of -------------------------------- 1. git submodules You probably think use git-submodules. However, bar/ is not a dependency or library where that would make sense but rather a part of of repo A. So the logical dependency is reversed: it's not A that depends on B but rather B depends on A. In some use cases changes in bar/ require additional chang= es in foo (in that use cases B is like a read-only view). 2. git clone + filter-branch We can clone the repo followed by a git-filter-branch (or its alternative git-filter-repo) git clone /path/to/repo/A /path/to/repo/B cd /path/to/repo/B git filter-bra= nch --subdirectory-filter bar -- --all This creates a sort of read-only clone. But has massive drawbacks: * We cannot do a simple git pull to update repo B to the new state of repo = A. To do that we have to clone and filer-branch it again. * It changes commit-IDs. * We cannot push changes done in B back to A. 3. git-subtree We can use git-subtree to filter the subdirectory and then clone the genera= ted branch as repo B, like so: # in repo A cd /path/to/repo/A git subtree split --prefix=3Dbar --annotate 'bar: ' --branch branch-bar git clone -b branch-bar /path/to/repo/A /path/to/repo/B Here we have: * It requires support from repo A which must generate the branch-bar. * Repo A now must contain two commit-histories (the main branch and the branch-bar) of the same logical-history. In particular the commit ids ar= e different for the same logical commit in the main branch and the branch-b= ar. * branch-bar must be regenerated every time. I have not (yet) investigated whether git-subtree is capable of continuing a split from the last commit= . So far I only managed that it recreates all commits in branch-bar (but with = the same commit-ids as before) * Because the commit-ids of branch-bar do not change (at least when called = with the same arguments), we can use git pull to update repo B * We can push changes in repo B back to repo A in the branch-bar. But I currently do not see a simple method how to incorporate this changes in= to the main branch. 4. git-sparse-checkout We can use git-sparse-checkout like so git clone /path/to/repo/A --no-checkout /path/to/repo/B cd /path/to/repo/B git sparse-checkout init git spares-checkout set bar This is kind a close to what I want in the sense that we can push and pull = and the commit-ids are unaltered. However, this totally gets the directory structure wrong, which is a no-go in some of the above use cases. An idea for a solution ---------------------- The following is an idea how the above feature could be implemented. This i= s just a rough sketch and I have not thought how this approach would interact with other git tooling. We add a (for example; names are up to debate) --subfolder argument to git clone: git clone --subfolder bar /path/to/repo/A /path/to/repo/B This clones the complete repo A but checks out contents of path bar in the = root directory. The HEAD points to the (full) commit. Additionally somewhere(tm)= we store that we have zoomed in to only see paths in bar (maybe git-worktree c= an be expanded for that?). That is stuff under bar is treated like a checkedout repo while all other s= tuff is treated like being in a bare repo. (This at least should be the guide li= ne when thinking about the behaviour git should provide) Doing a git push, git pull does the normal update of the repo but when chec= king out files to the working directory only those files under bar/ are consider= ed. When editing and committing files in repo B, the following would be a sane = thing to do: The (old) tree of the current HEAD is taken and then the subtree corresponding to bar is replaced with the tree in the index. That way we generate a full valid commit which can be pushed back to repo A. If we switch/checkout to a branch/commit that has not bar/ directory, then = the checkout copy should be empty. If we add something and commit it, then the parent tree-objects of the new commit should be altered to contain the path= bar. As git does not track directories this should work out as expected. Merge conflicts happen. If these happen for files inside the bar directory,= the we can do our usual stuff. Due to the flexibility of git we can arrange tha= t the commits/trees to be merged have a conflict outside of the bar directory. In that case we cannot produce a working copy of the commit. Thus, it seems appropriate to abort whatever we do and inform the user to use a full clone= for doing the merge. When cloning repo B to repo C, there are not restrictions whatsoever as B h= as a full copy of the repository (just not checked out). So when looking from "outside" repo A and repo B are indistinguishable. Thus the following works= : git clone --subfolder bar /path/to/repo/A /path/to/repo/B git clone /path/to/repo/B /path/to/repo/C Repo A and C are the same (both without a zoom). git clone --subfolder bar /path/to/repo/A /path/to/repo/B git clone --subfolder foo /path/to/repo/B /path/to/repo/C1 git clone --subfolder foo /path/to/repo/A /path/to/repo/C2 Repo C1 and C2 are the same (both zoomed to foo/). Non-goals: The following (weird?) thing is outside of the scope of this idea. zooming in into two (or more) directories simultaneously, e.g. repo A: foo1/bar/... foo2/foo3/baz/... and with the hypothetical git clone --subfolder foo1 --subfolder foo2/foo3 we get repo B: bar/... baz/... Also converting a zoomed repo B into a full (non-bare) repo A is not part o= f it. Although I think, this could easily be achieved by some usage of deleting t= he reference to the subfolder and doing a `git reset --hard` on the working co= py. Summary ------- That the problem is not about the size of the checkout (which sparse-checko= ut tackles), nor the size of the repo or the mount of data which needs to be downloaded (both of which clone --depth tackles), its about getting the directory structure in repo B right while also keeping a strong link to rep= o A as upstream to pull (and maybe push) changes. If I have missed any approach for a solution I'd like to hear about it. Best Mickey