git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* How to efficiently backup a bare repository?
@ 2018-11-23 10:23 Guilhem Bonnefille
  2018-11-24 22:44 ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 3+ messages in thread
From: Guilhem Bonnefille @ 2018-11-23 10:23 UTC (permalink / raw)
  To: Git List

Hi,

I'm managing many bare repositories for development teams.

One service we want to offer is to let developers retrieve old state
of the repository up to 30 days. For example, one developer
(accidently) removed (push -f) a branch/tag and realize few days later
(after vacations) that it was an error.

What is the best approach to do this?

Currently, we use a classical approach, backuping all the repo every
day. But this is far from efficient as:
- we accumulate 30th copies of the repository
- due to packing logic of Git, even if the content is mostly similar,
from one backup to another, there is no way to deduplicate.

Is there any tricks based on reflog? Even for deleted refs (branch/tags)?
Is there any tooling playing with the internal of git to offer such
feature, like copying all refs in a timestamped refs directory to
retain objects?

Thanks in advance for any tips letting improve the backup.
-- 
Guilhem BONNEFILLE
-=- JID: guyou@im.apinc.org MSN: guilhem_bonnefille@hotmail.com
-=- mailto:guilhem.bonnefille@gmail.com
-=- http://nathguil.free.fr/

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: How to efficiently backup a bare repository?
  2018-11-23 10:23 How to efficiently backup a bare repository? Guilhem Bonnefille
@ 2018-11-24 22:44 ` Ævar Arnfjörð Bjarmason
  2018-11-25  1:16   ` Junio C Hamano
  0 siblings, 1 reply; 3+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-11-24 22:44 UTC (permalink / raw)
  To: Guilhem Bonnefille; +Cc: Git List


On Fri, Nov 23 2018, Guilhem Bonnefille wrote:

> I'm managing many bare repositories for development teams.
>
> One service we want to offer is to let developers retrieve old state
> of the repository up to 30 days. For example, one developer
> (accidently) removed (push -f) a branch/tag and realize few days later
> (after vacations) that it was an error.
>
> What is the best approach to do this?
>
> Currently, we use a classical approach, backuping all the repo every
> day. But this is far from efficient as:
> - we accumulate 30th copies of the repository
> - due to packing logic of Git, even if the content is mostly similar,
> from one backup to another, there is no way to deduplicate.
>
> Is there any tricks based on reflog? Even for deleted refs (branch/tags)?
> Is there any tooling playing with the internal of git to offer such
> feature, like copying all refs in a timestamped refs directory to
> retain objects?
>
> Thanks in advance for any tips letting improve the backup.

There's no easy out of the box way to do exactly what you've
described. A few things come to mind:

a) If you can simply deny non-fast-forwards that's ideal. E.g. for some
   branches you care about, or tags. This is how most of us deal with
   this issue in practice. I.e. have some "blessed" refs that matter,
   and if someone rewinds their own topic branch that's their own
   problem.

b) You could as you touched upon have a post-receive hook that detects
   non-fast-forwards, and e.g. pushes a clobberd "master" or "v1.0" to
   some backup repo's 2018-11-24-23-39-04-master or whatever. Then users
   could grab old versions of refs from that repo. I do a similar thing
   at work to archive certain refs (old tags), but without renaming
   them.

   The advantage is that you get all refs ever, the disadvantage is that
   you're not going to get a copy of the repo as it was N days ago,
   it'll need to be manually pieced together.

c) Git could be made block-level de-duplication friendly. I was planning
   to work on it, but it's a small enough itch that I didn't care, but
   initial results look promising:
   https://public-inbox.org/git/20180125002942.GA21184@sigill.intra.peff.net/

d) Note that if you're e.g. rsyncing repos that are actively being
   pushed into you're likely to sometimes end up with corrupt repos
   unless you're very careful about what you grab and in what
   order. Best to backup repos with "git fetch".

e) If you're burned by one-off cases like this dev going away for 30
   days you could bump the default expiry that comes with git from 2
   weeks to e.g. 6 weeks. It's still a manual process to recover data
   (with fsck etc), but at least it's there.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: How to efficiently backup a bare repository?
  2018-11-24 22:44 ` Ævar Arnfjörð Bjarmason
@ 2018-11-25  1:16   ` Junio C Hamano
  0 siblings, 0 replies; 3+ messages in thread
From: Junio C Hamano @ 2018-11-25  1:16 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Guilhem Bonnefille, Git List

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> There's no easy out of the box way to do exactly what you've
> described. A few things come to mind:
> ...

Wouldn't it suffice to have a cron job that runs something like

	D=$(date +"%Y-%m-%d")
	git fetch $serving "refs/*:refs/backup-$D/*"

on the back-up box to fetch from the repository on the box the
end-users push into once a day?  In the back-up repository, the
refs/backup-2018-11-25/heads/master reference would be today's tip
of the master branch of the serving repository.  You can set the
expiry timeout to "now" (i.e. "gc" will immediately drop unreachable
objects, and that is fine because you expicitly have refs to pin
these objects anyway), get the dedup from "git fetch" for free,
repack the backup repository as a whole, and dropping the whole
refs/backup-2018-10-25/* hierarcy on 2018-11-25 is all you need to
expire the refs.

You may want to play with the ref-advertisement limiting options in
the recent Git, if it is too much to grow the amount of "have"s by
30x for the common ancestry negotiation.  But that is a small
implementation detail.


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2018-11-25  1:16 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-23 10:23 How to efficiently backup a bare repository? Guilhem Bonnefille
2018-11-24 22:44 ` Ævar Arnfjörð Bjarmason
2018-11-25  1:16   ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).