git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* File versioning based on shallow Git repositories?
@ 2018-04-12 18:01 Hallvard Breien Furuseth
  2018-04-12 18:47 ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 9+ messages in thread
From: Hallvard Breien Furuseth @ 2018-04-12 18:01 UTC (permalink / raw)
  To: git

Can I use a shallow Git repo for file versioning, and regularly purge
history older than e.g. 2 weeks?  Purged data MUST NOT be recoverable.

Or is there a backup tool based on shallow Git cloning which does this?
Push/pull to another shallow repo would be nice but is not required.
The files are text files up to 1/4 Gb, usually with few changes. 


If using Git - I see "git fetch --depth" can shorten history now.
How do I do that without 'fetch', in the origin repo?
Also Documentation/technical/shallow.txt describes some caveats, I'm
not sure how relevant they are.

To purge old data -
  git config core.logallrefupdates false
  git gc --prune=now --aggressive
Anything else?

I'm guessing that without --aggressive, some expired info might be
deduced from studying the packing of the remaining objects.  Don't
know if we'll be required to be that paranoid.

-- 
Hallvard

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: File versioning based on shallow Git repositories?
  2018-04-12 18:01 File versioning based on shallow Git repositories? Hallvard Breien Furuseth
@ 2018-04-12 18:47 ` Ævar Arnfjörð Bjarmason
  2018-04-12 19:36   ` Hallvard Breien Furuseth
  0 siblings, 1 reply; 9+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-04-12 18:47 UTC (permalink / raw)
  To: Hallvard Breien Furuseth; +Cc: git


On Thu, Apr 12 2018, Hallvard Breien Furuseth wrote:

> Can I use a shallow Git repo for file versioning, and regularly purge
> history older than e.g. 2 weeks?  Purged data MUST NOT be recoverable.
>
> Or is there a backup tool based on shallow Git cloning which does this?
> Push/pull to another shallow repo would be nice but is not required.
> The files are text files up to 1/4 Gb, usually with few changes.
>
>
> If using Git - I see "git fetch --depth" can shorten history now.
> How do I do that without 'fetch', in the origin repo?
> Also Documentation/technical/shallow.txt describes some caveats, I'm
> not sure how relevant they are.
>
> To purge old data -
>   git config core.logallrefupdates false
>   git gc --prune=now --aggressive
> Anything else?
>
> I'm guessing that without --aggressive, some expired info might be
> deduced from studying the packing of the remaining objects.  Don't
> know if we'll be required to be that paranoid.

The shallow feature is not for this use-case, but there's a much easier
solution that I've used for exactly this use-case, e.g. taking backups
of SQL dumps that delta-compress well, and then throwing out old
backups.

You:

1. Create a backup.git repo
2. Each time you make a backup, checkout a new orphan branch, see "git
   checkout --orphan"
3. You copy the files over, commit them, "git log" at this point shows
   one commit no matter if you've done this before.
4. You create a tag for this backup, e.g. one named after the current
   time, delete the branch.
5. You then have a retention period for the tags, e.g. only keep the
   last 30 tags if you do daily backups for 30 days of backups.

Then as soon as you delete the tags the old commit will be unreferenced,
and you can make git-gc delete the data.

You'll still be able to `git diff` between tags, even though they have
unrelated histories, and the files will still delta-compress.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: File versioning based on shallow Git repositories?
  2018-04-12 18:47 ` Ævar Arnfjörð Bjarmason
@ 2018-04-12 19:36   ` Hallvard Breien Furuseth
  2018-04-12 20:46     ` Ævar Arnfjörð Bjarmason
  2018-04-13  8:52     ` Jakub Narebski
  0 siblings, 2 replies; 9+ messages in thread
From: Hallvard Breien Furuseth @ 2018-04-12 19:36 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git

On 12. april 2018 20:47, Ævar Arnfjörð Bjarmason wrote:
> 1. Create a backup.git repo
> 2. Each time you make a backup, checkout a new orphan branch, see "git
>     checkout --orphan"
> 3. You copy the files over, commit them, "git log" at this point shows
>     one commit no matter if you've done this before.
> 4. You create a tag for this backup, e.g. one named after the current
>     time, delete the branch.
> 5. You then have a retention period for the tags, e.g. only keep the
>     last 30 tags if you do daily backups for 30 days of backups.
> 
> Then as soon as you delete the tags the old commit will be unreferenced,
> and you can make git-gc delete the data.

Nice!
Why the tags though, instead of branches named after the current time?

One --orphan branch/tag per day with several commits would work for me.

Also maybe it'll be worthwhile to generate .git/info/grafts in a local
clone of the repo to get back easily visible history.  No grafts in
the original repo, grafts mess things up.

-- 
Hallvard

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: File versioning based on shallow Git repositories?
  2018-04-12 19:36   ` Hallvard Breien Furuseth
@ 2018-04-12 20:46     ` Ævar Arnfjörð Bjarmason
  2018-04-12 21:07       ` Rafael Ascensao
  2018-04-13  8:52     ` Jakub Narebski
  1 sibling, 1 reply; 9+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-04-12 20:46 UTC (permalink / raw)
  To: Hallvard Breien Furuseth; +Cc: git


On Thu, Apr 12 2018, Hallvard Breien Furuseth wrote:

> On 12. april 2018 20:47, Ævar Arnfjörð Bjarmason wrote:
>> 1. Create a backup.git repo
>> 2. Each time you make a backup, checkout a new orphan branch, see "git
>>     checkout --orphan"
>> 3. You copy the files over, commit them, "git log" at this point shows
>>     one commit no matter if you've done this before.
>> 4. You create a tag for this backup, e.g. one named after the current
>>     time, delete the branch.
>> 5. You then have a retention period for the tags, e.g. only keep the
>>     last 30 tags if you do daily backups for 30 days of backups.
>>
>> Then as soon as you delete the tags the old commit will be unreferenced,
>> and you can make git-gc delete the data.
>
> Nice!
> Why the tags though, instead of branches named after the current time?

Because tags are idiomatic in git for a reference that doesn't change,
but sure, if you'd like branches that'll work too.

> One --orphan branch/tag per day with several commits would work for me.
>
> Also maybe it'll be worthwhile to generate .git/info/grafts in a local
> clone of the repo to get back easily visible history.  No grafts in
> the original repo, grafts mess things up.

Maybe, I have not tried this with grafts.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: File versioning based on shallow Git repositories?
  2018-04-12 20:46     ` Ævar Arnfjörð Bjarmason
@ 2018-04-12 21:07       ` Rafael Ascensao
  2018-04-12 21:22         ` Hallvard Breien Furuseth
  0 siblings, 1 reply; 9+ messages in thread
From: Rafael Ascensao @ 2018-04-12 21:07 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Hallvard Breien Furuseth, Git Mailing List

Would initiating a repo with a empty root commit, tag it with 'base' then

use $ git rebase --onto base master@{30 days ago} master;

be viable?

The --orphan & tag is perhaps more robust, since it's "harder" to move
tags around.

--
Rafael Ascensão

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: File versioning based on shallow Git repositories?
  2018-04-12 21:07       ` Rafael Ascensao
@ 2018-04-12 21:22         ` Hallvard Breien Furuseth
  0 siblings, 0 replies; 9+ messages in thread
From: Hallvard Breien Furuseth @ 2018-04-12 21:22 UTC (permalink / raw)
  To: Rafael Ascensao, Ævar Arnfjörð Bjarmason; +Cc: Git Mailing List

On 12. april 2018 23:07, Rafael Ascensao wrote:
> Would initiating a repo with a empty root commit, tag it with 'base' then
> use $ git rebase --onto base master@{30 days ago} master;
> be viable?

No... my question was confused from the beginning.  With such large files
I _shouldn't_ have history (or grafts), otherwise Git spends a lot of CPU
time creating diffs when I look at a commit, or worse, when I try git log.
Which I discovered quickly when trying real data instead of test-data:-)

Ævar's suggestion was exactly right in that respect.  Thanks again!

-- 
Hallvard

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: File versioning based on shallow Git repositories?
  2018-04-12 19:36   ` Hallvard Breien Furuseth
  2018-04-12 20:46     ` Ævar Arnfjörð Bjarmason
@ 2018-04-13  8:52     ` Jakub Narebski
  2018-04-13 11:12       ` Johannes Schindelin
  1 sibling, 1 reply; 9+ messages in thread
From: Jakub Narebski @ 2018-04-13  8:52 UTC (permalink / raw)
  To: Hallvard Breien Furuseth; +Cc: Ævar Arnfjörð Bjarmason, git

Hallvard Breien Furuseth <h.b.furuseth@usit.uio.no> writes:

> Also maybe it'll be worthwhile to generate .git/info/grafts in a local
> clone of the repo to get back easily visible history.  No grafts in
> the original repo, grafts mess things up.

Just a reminder: modern Git has "git replace", a modern and safe
alternative to the grafts file.

Best,
-- 
Jakub Narębski

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: File versioning based on shallow Git repositories?
  2018-04-13  8:52     ` Jakub Narebski
@ 2018-04-13 11:12       ` Johannes Schindelin
  2018-04-13 21:57         ` Jakub Narebski
  0 siblings, 1 reply; 9+ messages in thread
From: Johannes Schindelin @ 2018-04-13 11:12 UTC (permalink / raw)
  To: Jakub Narebski
  Cc: Hallvard Breien Furuseth, Ævar Arnfjörð Bjarmason,
	git

Hi Kuba,

On Fri, 13 Apr 2018, Jakub Narebski wrote:

> Hallvard Breien Furuseth <h.b.furuseth@usit.uio.no> writes:
> 
> > Also maybe it'll be worthwhile to generate .git/info/grafts in a local
> > clone of the repo to get back easily visible history.  No grafts in
> > the original repo, grafts mess things up.
> 
> Just a reminder: modern Git has "git replace", a modern and safe
> alternative to the grafts file.

Right!

Maybe it is time to start deprecating grafts? They *do* cause problems,
such as weird "missing objects" problems when trying to fetch into, or
push from, a repository with grafts. These problems are not shared by the
`git replace` method.

I just sent out a patch to add a deprecation warning.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: File versioning based on shallow Git repositories?
  2018-04-13 11:12       ` Johannes Schindelin
@ 2018-04-13 21:57         ` Jakub Narebski
  0 siblings, 0 replies; 9+ messages in thread
From: Jakub Narebski @ 2018-04-13 21:57 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Hallvard Breien Furuseth, Ævar Arnfjörð Bjarmason,
	git

Hello Johannes,

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> On Fri, 13 Apr 2018, Jakub Narebski wrote:
>> Hallvard Breien Furuseth <h.b.furuseth@usit.uio.no> writes:
>> 
>>> Also maybe it'll be worthwhile to generate .git/info/grafts in a local
>>> clone of the repo to get back easily visible history.  No grafts in
>>> the original repo, grafts mess things up.
>> 
>> Just a reminder: modern Git has "git replace", a modern and safe
>> alternative to the grafts file.
>
> Right!
>
> Maybe it is time to start deprecating grafts? They *do* cause problems,
> such as weird "missing objects" problems when trying to fetch into, or
> push from, a repository with grafts. These problems are not shared by the
> `git replace` method.

Also you can propagate "git replace" info with clone / fetch / push.

> I just sent out a patch to add a deprecation warning.

Thank you for this.

-- 
Jakub Narębski

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2018-04-13 21:58 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-12 18:01 File versioning based on shallow Git repositories? Hallvard Breien Furuseth
2018-04-12 18:47 ` Ævar Arnfjörð Bjarmason
2018-04-12 19:36   ` Hallvard Breien Furuseth
2018-04-12 20:46     ` Ævar Arnfjörð Bjarmason
2018-04-12 21:07       ` Rafael Ascensao
2018-04-12 21:22         ` Hallvard Breien Furuseth
2018-04-13  8:52     ` Jakub Narebski
2018-04-13 11:12       ` Johannes Schindelin
2018-04-13 21:57         ` Jakub Narebski

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).