git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Son Luong Ngoc <sluongng@gmail.com>
Cc: Martin Fick <mfick@codeaurora.org>,
	Taylor Blau <ttaylorr@github.com>, Sun Chao <16657101987@163.com>,
	Taylor Blau <me@ttaylorr.com>,
	Sun Chao via GitGitGadget <gitgitgadget@gmail.com>,
	git <git@vger.kernel.org>
Subject: Re: [PATCH v2] packfile: freshen the mtime of packfile by configuration
Date: Tue, 20 Jul 2021 08:29:17 +0200	[thread overview]
Message-ID: <878s21wl4z.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <CAL3xRKee3YmOrV_-4Tu6FmJyRnS2y-tdiAmXp5TjzL_WxQNrtw@mail.gmail.com>


On Thu, Jul 15 2021, Son Luong Ngoc wrote:

> Hi folks,
>
> On Wed, Jul 14, 2021 at 10:03 PM Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
>>
>> *nod*
>>
>> FWIW at an ex-job I helped systems administrators who'd produced such a
>> broken backup-via-rsync create a hybrid version as an interim
>> solution. I.e. it would sync the objects via git transport, and do an
>> rsync on a whitelist (or blacklist), so pickup config, but exclude
>> objects.
>>
>> "Hybrid" because it was in a state of needing to deal with manual
>> tweaking of config.
>>
>> But usually someone who's needing to thoroughly solve this backup
>> problem will inevitably end up with wanting to drive everything that's
>> not in the object or refstore from some external system, i.e. have
>> config be generated from puppet, a database etc., ditto for alternates
>> etc.
>>
>> But even if you can't get to that point (or don't want to) I'd say aim
>> for the hybrid system.
>
> FWIW, we are running our repo on top of a some-what flickery DRBD setup and
> we decided to use both
>
>   git clone --upload-pack 'git -c transfer.hiderefs="!refs"
> upload-pack' --mirror`
>
> and
>
>   `tar`
>
> to create 2 separate snapshots for backup in parallel (full backup,
> not incremental).
>
> In case of recovery (manual), we first rely on the git snapshot and if
> there is any
> missing objects/refs, we will try to get it from the tarball.

That sounds good, and similar to what I described with that "hybrid"
setup.

>>
>> This isn't some purely theoretical concern b.t.w., the system using
>> rsync like this was producing repos that wouldn't fsck all the time, and
>> it wasn't such a busy site.
>>
>> I suspect (but haven't tried) that for someone who can't easily change
>> their backup solution they'd get most of the benefits of git-native
>> transport by having their "rsync" sync refs, then objects, not the other
>> way around. Glob order dictates that most backup systems will do
>> objects, then refs (which will of course, at that point, refer to
>> nonexisting objects).
>>
>> It's still not safe, you'll still be subject to races, but probably a
>> lot better in practice.
>
> I would love to get some guidance in official documentation on what is the best
> practice around handling git data on the server side.
>
> Is git-clone + git-bundle the go-to solution?
> Should tar/rsync not be used completely or is there a trade-off?

I should have tempered some of those comments, it's perfectly fine in
general to use tar+rsync for "backing up" git repositories in certain
contexts. E.g. when I switch laptops or whatever it's what I do to grab
data.

The problem is when the data isn't at rest, i.e. in the context of an
active server.

There you start moving towards a scale where it goes from "sure, it's
fine" to "this is such a bad idea that nobody should pursue it".

If you're running a setup where you're starting to submit patches to
git.git you're probably at the far end of that spectrum.

Whether it's clone, push, fetch, bundle etc. doesn't really matter, the
important part is that you're using git's pack transport mechanism to
ferry updates around, which gives you guarantees rsync+tar can't,
particularly in the face of concurrently updated data.


  reply	other threads:[~2021-07-20  6:32 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-10 19:01 [PATCH] packfile: enhance the mtime of packfile by idx file Sun Chao via GitGitGadget
2021-07-11 23:44 ` Ævar Arnfjörð Bjarmason
2021-07-12 16:17   ` Sun Chao
2021-07-14  1:28 ` [PATCH v2] packfile: freshen the mtime of packfile by configuration Sun Chao via GitGitGadget
2021-07-14  1:39   ` Ævar Arnfjörð Bjarmason
2021-07-14  2:52     ` Taylor Blau
2021-07-14 16:46       ` Sun Chao
2021-07-14 17:04         ` Taylor Blau
2021-07-14 18:19           ` Ævar Arnfjörð Bjarmason
2021-07-14 19:11             ` Martin Fick
2021-07-14 19:41               ` Ævar Arnfjörð Bjarmason
2021-07-14 20:20                 ` Martin Fick
2021-07-20  6:32                   ` Ævar Arnfjörð Bjarmason
2021-07-15  8:23                 ` Son Luong Ngoc
2021-07-20  6:29                   ` Ævar Arnfjörð Bjarmason [this message]
2021-07-14 19:30             ` Taylor Blau
2021-07-14 19:32               ` Ævar Arnfjörð Bjarmason
2021-07-14 19:52                 ` Taylor Blau
2021-07-14 21:40               ` Junio C Hamano
2021-07-15 16:30           ` Sun Chao
2021-07-15 16:42             ` Taylor Blau
2021-07-15 16:48               ` Sun Chao
2021-07-14 16:11     ` Sun Chao
2021-07-19 19:53   ` [PATCH v3] " Sun Chao via GitGitGadget
2021-07-19 20:51     ` Taylor Blau
2021-07-20  0:07       ` Junio C Hamano
2021-07-20 15:07         ` Sun Chao
2021-07-20  6:19       ` Ævar Arnfjörð Bjarmason
2021-07-20 15:34         ` Sun Chao
2021-07-20 15:00       ` Sun Chao
2021-07-20 16:53         ` Taylor Blau
2021-08-15 17:08     ` [PATCH v4 0/2] " Sun Chao via GitGitGadget
2021-08-15 17:08       ` [PATCH v4 1/2] packfile: rename `derive_filename()` to `derive_pack_filename()` Sun Chao via GitGitGadget
2021-08-15 17:08       ` [PATCH v4 2/2] packfile: freshen the mtime of packfile by bump file Sun Chao via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=878s21wl4z.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=16657101987@163.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=me@ttaylorr.com \
    --cc=mfick@codeaurora.org \
    --cc=sluongng@gmail.com \
    --cc=ttaylorr@github.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).