From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: Mike Hommey <mh@glandium.org>,
Git mailing list <git@vger.kernel.org>, Jeff King <peff@peff.net>,
Eric Wong <e@80x24.org>
Subject: Re: Git packs friendly to block-level deduplication
Date: Wed, 24 Jan 2018 23:47:27 +0100 [thread overview]
Message-ID: <87a7x2yiv4.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <xmqq607qc2v6.fsf@gitster.mtv.corp.google.com>
On Wed, Jan 24 2018, Junio C. Hamano jotted:
> Mike Hommey <mh@glandium.org> writes:
>
>> FWIW, I sidestep the problem entirely by using alternatives.
>
> That's a funny way to use the word "side-step", I would say, as the
> alternate object store support is there exactly for this use case.
Things you can't do with alternates that block-level de-duplication
gives you:
1. Your filesystem may be mounted from some NFS host that does
block-level deduplication internally against other content you don't
have permission to access, think the /home of a bunch of dev VMs you
know will have the same repos cloned (along with most of the same FS
content, e.g. the OS).
In this case the storage can de-duplicate blocks purely as an
implementation without git knowing about it, as long as git (or any
other program using the FS) can be coerced into writing the same
blocks other gits on other machines write, at least most of the
time.
2. Ditto NFS, but e.g. chroot'd /home on a local non-NFS.
3. Even if the repos are all on the same host they may just be ad-hoc
cloned in /home by different users, it's easy to write something in
/etc/gitconfig to give them the same repack settings, less so to
maintain some git-clone wrapper that implictily adds --reference
(they'll not know, or forget) to all clones, or goes hunting around
for checkouts and adding alternates after the fact.
4. With alternates you always need to maintain some blessed "clone from
this" repo that can't go away least everything cloned from it become
corrupt and needs manual repair. If you're aiming to just save
storage block-level deduplication may be a better trade-off.
Also once you clone with --reference doesn't the local clone only add
new objects as you "git fetch", never pruning those if the same objects
appear in the alternate later on, or am I misremembering things?
I mainly have use-case #1 & #3, although they could both be made to use
alternates with some hassle (e.g. for #1 exposing a separate read-only
copy of "these are alternates" to each VM) it seemed worthwhile to see
if repack could be made to be more block-level deduplication friendly,
as deploying that is easier.
next prev parent reply other threads:[~2018-01-24 22:47 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-01-24 22:03 Git packs friendly to block-level deduplication Ævar Arnfjörð Bjarmason
2018-01-24 22:19 ` Mike Hommey
2018-01-24 22:23 ` Junio C Hamano
2018-01-24 22:30 ` Mike Hommey
2018-01-24 22:47 ` Ævar Arnfjörð Bjarmason [this message]
2018-01-24 22:25 ` Eric Wong
2018-01-24 22:37 ` Elijah Newren
2018-01-24 23:06 ` Ævar Arnfjörð Bjarmason
2018-01-24 23:32 ` Jeff King
2018-01-24 23:22 ` Jeff King
2018-01-25 0:03 ` Ævar Arnfjörð Bjarmason
2018-01-25 0:10 ` Jeff King
2018-01-25 0:29 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87a7x2yiv4.fsf@evledraar.gmail.com \
--to=avarab@gmail.com \
--cc=e@80x24.org \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=mh@glandium.org \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).