git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Martin Langhoff <martin.langhoff@gmail.com>
Cc: Junio C Hamano <gitster@pobox.com>,
	Jonathan Nieder <jrnieder@gmail.com>,
	e@80x24.org, Git Mailing List <git@vger.kernel.org>
Subject: Re: git svn clone/fetch hits issues with gc --auto
Date: Wed, 10 Oct 2018 13:27:06 +0200	[thread overview]
Message-ID: <878t36f3ed.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <CACPiFCL0oTjN+-aYgKEDtKC0gYwkv6RLMwakdJV85PJ5XQej6g@mail.gmail.com>


On Wed, Oct 10 2018, Martin Langhoff wrote:

> Looking around, Jonathan Tan's "[PATCH] gc: do not warn about too many
> loose objects" makes sense to me.
>
> - remove unactionable warning
> - as the warning is gone, no gc.log is produced
> - subsequent gc runs don't exit due to gc.log
>
> My very humble +1 on that.
>
> As for downsides... if we have truly tons of _recent_ loose objects,
> it'll ... take disk space? I'm fine with that.

As Jeff's
https://public-inbox.org/git/20180716175103.GB18636@sigill.intra.peff.net/
and my https://public-inbox.org/git/878t69dgvx.fsf@evledraar.gmail.com/
note it's a bit more complex than that.

I.e.:

 - The warning is actionable, you can decide to up your expiration
   policy.

 - We use this warning as a proxy for "let's not run for a day",
   otherwise we'll just grind on gc --auto trying to consolidate
   possibly many hundreds of K of loose objects only to find none of
   them can be pruned because the run into the expiry policy. With the
   warning we retry that once per day, which sucks less.

 - This conflation of the user-visible warning and the policy is an
   emergent effect of how the different gc pieces interact, which as I
   note in the linked thread(s) sucks.

   But we can't just yank one piece away (as Jonathan's patch does)
   without throwing the baby out with the bathwater.

   It will mean that e.g. if you have 10k loose objects in your git.git,
   and created them just now, that every time you run anything that runs
   "gc --auto" we'll fork to the background, peg a core at 100% CPU for
   2-3 minutes or whatever it is, only do get nowhere and do the same
   thing again in ~3 minutes when you run your next command.

 - I think you may be underestimating some of the cases where this ends
   up taking a huge amount of disk space (and now we'll issue at least
   *some*) warning. See my
   https://public-inbox.org/git/87fu6bmr0j.fsf@evledraar.gmail.com/
   where a repo's .git went from 2.5G to 30G due to being stuck in this
   mode.

> For more aggressive gc options, thoughts:
>
>  - Do we always consider git gc --prune=now "safe" in a "won't delete
> stuff the user is likely to want" sense? For example -- are the
> references from reflogs enough safety?

The --prune=now command is not generally safe for the reasons noted in
the "NOTES" section in "git help gc".

>  - Even if we don't, for some commands it should be safe to run git gc
> --prune=now at the end of the process, for example an import that
> generates a new git repo (git svn clone).

Yeah I don't see a problem with that, I didn't know about this
interesting use-case, i.e. that "git svn clone" will create a lot of
loose objects.

As seen in my
https://public-inbox.org/git/87tvm3go42.fsf@evledraar.gmail.com/ I'm
working on making "gc --auto" run at the end of clone for unrelated
reasons, i.e. so we generate the commit-graph, seems like "git svn
clone" could do something similar.

So it's creating a lot of garbage during its cloning process that can
just be immediately thrown away? What is it doing? Using the object
store as a scratch pad for its own temporary state?

> m
> On Tue, Oct 9, 2018 at 10:49 PM Junio C Hamano <gitster@pobox.com> wrote:
>>
>> Forwarding to Jonathan, as I think this is an interesting supporting
>> vote for the topic that we were stuck on.
>>
>> Eric Wong <e@80x24.org> writes:
>>
>> > Martin Langhoff <martin.langhoff@gmail.com> wrote:
>> >> Hi folks,
>> >>
>> >> Long time no see! Importing a 3GB (~25K revs, tons of files) SVN repo
>> >> I hit the gc error:
>> >>
>> >> warning: There are too many unreachable loose objects; run 'git prune'
>> >> to remove them.
>> >> gc --auto: command returned error: 255
>> >
>> > GC can be annoying when that happens... For git-svn, perhaps
>> > this can be appropriate to at least allow the import to continue:
>> >
>> > diff --git a/perl/Git/SVN.pm b/perl/Git/SVN.pm
>> > index 76b2965905..9b0caa3d47 100644
>> > --- a/perl/Git/SVN.pm
>> > +++ b/perl/Git/SVN.pm
>> > @@ -999,7 +999,7 @@ sub restore_commit_header_env {
>> >  }
>> >
>> >  sub gc {
>> > -     command_noisy('gc', '--auto');
>> > +     eval { command_noisy('gc', '--auto') };
>> >  };
>> >
>> >  sub do_git_commit {
>> >
>> >
>> > But yeah, somebody else who works on git regularly could
>> > probably stop repack from writing thousands of loose
>> > objects (and instead write a self-contained pack with
>> > those objects, instead).  I haven't followed git closely
>> > lately, myself.

  reply	other threads:[~2018-10-10 11:27 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CACPiFCJZ83sqE7Gaj2pa12APkBF5tau-C6t4_GrXBWDwcMnJHg@mail.gmail.com>
2018-10-09 22:51 ` git svn clone/fetch hits issues with gc --auto Martin Langhoff
2018-10-09 23:45   ` Eric Wong
2018-10-10  2:49     ` Junio C Hamano
2018-10-10 11:01       ` Martin Langhoff
2018-10-10 11:27         ` Ævar Arnfjörð Bjarmason [this message]
2018-10-10 11:41           ` Martin Langhoff
2018-10-10 11:48             ` Ævar Arnfjörð Bjarmason
2018-10-10 16:51               ` Jonathan Nieder
2018-10-10 17:46                 ` Jeff King
2018-10-10 19:27                   ` [PATCH] gc: introduce an --auto-exit-code option for undoing 3029970275 Ævar Arnfjörð Bjarmason
2018-10-10 20:35                     ` Jeff King
2018-10-10 20:59                       ` Ævar Arnfjörð Bjarmason
2018-10-11  0:38                         ` Jeff King
2018-10-10 20:56                     ` Jonathan Nieder
2018-10-10 21:05                       ` Ævar Arnfjörð Bjarmason
2018-10-10 21:14                         ` Jonathan Nieder
2018-10-10 21:36                           ` Junio C Hamano
2018-10-10 21:51                             ` Jonathan Nieder
2018-10-10 22:16                               ` Ævar Arnfjörð Bjarmason
2018-10-10 22:25                                 ` Jonathan Nieder
2018-10-10 18:38                 ` git svn clone/fetch hits issues with gc --auto Ævar Arnfjörð Bjarmason
2018-10-10 11:43           ` Ævar Arnfjörð Bjarmason
2018-10-10 12:21           ` Junio C Hamano
2018-10-10 12:37             ` Ævar Arnfjörð Bjarmason
2018-10-10 16:38             ` Martin Langhoff
2018-10-10  8:04   ` Ævar Arnfjörð Bjarmason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=878t36f3ed.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=e@80x24.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jrnieder@gmail.com \
    --cc=martin.langhoff@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).