git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: Mike Hommey <mh@glandium.org>,
	Git mailing list <git@vger.kernel.org>, Jeff King <peff@peff.net>,
	Eric Wong <e@80x24.org>
Subject: Re: Git packs friendly to block-level deduplication
Date: Wed, 24 Jan 2018 23:47:27 +0100	[thread overview]
Message-ID: <87a7x2yiv4.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <xmqq607qc2v6.fsf@gitster.mtv.corp.google.com>


On Wed, Jan 24 2018, Junio C. Hamano jotted:

> Mike Hommey <mh@glandium.org> writes:
>
>> FWIW, I sidestep the problem entirely by using alternatives.
>
> That's a funny way to use the word "side-step", I would say, as the
> alternate object store support is there exactly for this use case.

Things you can't do with alternates that block-level de-duplication
gives you:

 1. Your filesystem may be mounted from some NFS host that does
    block-level deduplication internally against other content you don't
    have permission to access, think the /home of a bunch of dev VMs you
    know will have the same repos cloned (along with most of the same FS
    content, e.g. the OS).

    In this case the storage can de-duplicate blocks purely as an
    implementation without git knowing about it, as long as git (or any
    other program using the FS) can be coerced into writing the same
    blocks other gits on other machines write, at least most of the
    time.

 2. Ditto NFS, but e.g. chroot'd /home on a local non-NFS.

 3. Even if the repos are all on the same host they may just be ad-hoc
    cloned in /home by different users, it's easy to write something in
    /etc/gitconfig to give them the same repack settings, less so to
    maintain some git-clone wrapper that implictily adds --reference
    (they'll not know, or forget) to all clones, or goes hunting around
    for checkouts and adding alternates after the fact.

 4. With alternates you always need to maintain some blessed "clone from
    this" repo that can't go away least everything cloned from it become
    corrupt and needs manual repair. If you're aiming to just save
    storage block-level deduplication may be a better trade-off.

Also once you clone with --reference doesn't the local clone only add
new objects as you "git fetch", never pruning those if the same objects
appear in the alternate later on, or am I misremembering things?

I mainly have use-case #1 & #3, although they could both be made to use
alternates with some hassle (e.g. for #1 exposing a separate read-only
copy of "these are alternates" to each VM) it seemed worthwhile to see
if repack could be made to be more block-level deduplication friendly,
as deploying that is easier.

  parent reply	other threads:[~2018-01-24 22:47 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-24 22:03 Git packs friendly to block-level deduplication Ævar Arnfjörð Bjarmason
2018-01-24 22:19 ` Mike Hommey
2018-01-24 22:23   ` Junio C Hamano
2018-01-24 22:30     ` Mike Hommey
2018-01-24 22:47     ` Ævar Arnfjörð Bjarmason [this message]
2018-01-24 22:25 ` Eric Wong
2018-01-24 22:37 ` Elijah Newren
2018-01-24 23:06   ` Ævar Arnfjörð Bjarmason
2018-01-24 23:32     ` Jeff King
2018-01-24 23:22 ` Jeff King
2018-01-25  0:03   ` Ævar Arnfjörð Bjarmason
2018-01-25  0:10     ` Jeff King
2018-01-25  0:29       ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87a7x2yiv4.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=e@80x24.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mh@glandium.org \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).