git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Cc: Derrick Stolee <stolee@gmail.com>,
	Lars Schneider <larsxschneider@gmail.com>,
	git <git@vger.kernel.org>, Jeff King <peff@peff.net>,
	Duy Nguyen <pclouds@gmail.com>
Subject: Re: worktrees vs. alternates
Date: Wed, 16 May 2018 17:34:34 +0200	[thread overview]
Message-ID: <87k1s3bomt.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <0f19f9f8-d215-622e-5090-1341c013babc@linuxfoundation.org>


On Wed, May 16 2018, Konstantin Ryabitsev wrote:

> On 05/16/18 09:02, Derrick Stolee wrote:
>> This is the biggest difference. You cannot have the same ref checked out
>> in multiple worktrees, as they both may edit that ref. The alternates
>> allow you to share data in a "read only" fashion. If you have one repo
>> that is the "base" repo that manages that objects dir, then that is
>> probably a good way to reduce the duplication. I'm not familiar with
>> what happens when a "child" repo does 'git gc' or 'git repack', will it
>> delete the local objects that is sees exist in the alternate?
>
> The parent repo is not keeping track of any other repositories that may
> be using it for alternates, which is why you basically:
>
> 1. never run auto-gc in the parent repo
> 2. repack it manually using -Ad to keep loose objects that other repos
> may be borrowing (but we don't know if they are)
> 3. never prune the parent repo, because this may delete objects other
> repos are borrowing
>
> Very infrequently you may consider this extra set of maintenance steps:
>
> 1. Find every repo mentioning the parent repository in their alternates
> 2. Repack them without the -l switch (which copies all the borrowed
> objects into those repos)
> 3. Once all child repos have been repacked this way, prune the parent
> repo (it's safe now)
> 4. Repack child repos again, this time with the -l flag, to get your
> savings back.
>
> I would heartily love a way to teach git-repack to recognize when an
> object it's borrowing from the parent repo is in danger of being pruned.
> The cheapest way of doing this would probably be to hardlink loose
> objects into its own objects directory and only consider "safe" objects
> those that are part of the parent repository's pack. This should make
> alternates a lot safer, just in case git-prune happens to run by accident.

I may have missed some edge case, but I believe this entire workaround
isn't needed if you guarantee that the parent repo doesn't contain any
objects that will get un-referenced.

You'd do that in the common case by cloning with --single-branch, and
depending on your setup --no-tags (if you delete tags). This is assuming
that your HEAD branch points to something like a "master" that doesn't
get rewound.

The problem you're describing happens if say you clone git.git and have
the "pu" branch in there in the parent, and as a result you get child
repos referencing those objects, but when the parent GCs after "pu" is
rewound the child repos break. Thus your elaborate work-around.

But that situation isn't possible in the first place if you only ever
import the "master" branch, or other references guaranteed not to
change.

Of course that has the trade-off that every child repo needs to get its
own objects for the "next" branch, "pu", etc. But those are
comparatively tiny.

I wasn't aware of -l (--local), or had forgotten about it. I thought
that we didn't have that and the "child" repos would just keep growing
over time, i.e. not get rid of the objects we're fetching into the
parent (which the parent might get later due to the child, say if it's
fetched in a daily cronjob). Good to know that's not the case.

With that --local flag the trade-off of not fetching "next" and "pu"
etc. should become irrelevant over time, as they migrate to "master"
they'll get de-duplicated, or alternatively GC'd by the child repos if
they don't make it.

>> GVFS uses alternates in this same way: we create a drive-wide "shared
>> object cache" that GVFS manages. We put our prefetch packs filled with
>> commits and trees in there, and any loose objects that are downloaded
>> via the object virtualization are placed as loose objects in the
>> alternate. We also store the multi-pack-index and commit-graph in that
>> alternate. This means that the only objects in each src dir are those
>> created by the developer doing their normal work.
>
> I'm very interested in GVFS, because it would certainly make my life
> easier maintaining source.codeaurora.org, which is many thousands of
> repos that are mostly forks of the same stuff. However, GVFS appears to
> only exist for Windows (hint-hint, nudge-nudge). :)

This should make you happy:

https://arstechnica.com/gadgets/2017/11/microsoft-and-github-team-up-to-take-git-virtual-file-system-to-macos-linux/

But I don't know what the current status is or where it can be followed.

  reply	other threads:[~2018-05-16 15:34 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-16  8:13 worktrees vs. alternates Lars Schneider
2018-05-16  9:29 ` Ævar Arnfjörð Bjarmason
2018-05-16  9:42   ` Robert P. J. Day
2018-05-16 11:07     ` Ævar Arnfjörð Bjarmason
2018-05-16  9:51   ` Lars Schneider
2018-05-16 10:33     ` Ævar Arnfjörð Bjarmason
2018-05-16 13:02       ` Derrick Stolee
2018-05-16 14:58         ` Konstantin Ryabitsev
2018-05-16 15:34           ` Ævar Arnfjörð Bjarmason [this message]
2018-05-16 15:49             ` Konstantin Ryabitsev
2018-05-16 17:54               ` Ævar Arnfjörð Bjarmason
2018-05-16 17:14           ` Martin Fick
2018-05-16 17:41             ` Konstantin Ryabitsev
2018-05-16 18:02               ` Ævar Arnfjörð Bjarmason
2018-05-16 18:12                 ` Konstantin Ryabitsev
2018-05-16 18:26                   ` Martin Fick
2018-05-16 19:01                     ` Konstantin Ryabitsev
2018-05-16 19:03                       ` Martin Fick
2018-05-16 19:11                         ` Konstantin Ryabitsev
2018-05-16 19:18                           ` Martin Fick
2018-05-16 19:23                       ` Jeff King
2018-05-16 19:29                         ` Konstantin Ryabitsev
2018-05-16 19:37                           ` Jeff King
2018-05-16 19:40                             ` Martin Fick
2018-05-16 20:06                               ` Jeff King
2018-05-16 20:43                                 ` Martin Fick
2018-05-16 20:02                             ` Konstantin Ryabitsev
2018-05-16 20:17                               ` Jeff King
2018-05-17  0:43                               ` Sitaram Chamarty
2018-05-17  3:31                                 ` Jeff King
2018-05-19  5:45                                   ` Duy Nguyen
2018-05-16 19:14           ` Jeff King
2018-05-16 21:18             ` Stefan Beller
2018-05-16 23:45               ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87k1s3bomt.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=konstantin@linuxfoundation.org \
    --cc=larsxschneider@gmail.com \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).