git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Elijah Newren <newren@gmail.com>
Cc: git@vger.kernel.org, jonathantanmy@google.com, jrnieder@gmail.com
Subject: Re: [PATCH] gc: do not warn about too many loose objects
Date: Mon, 16 Jul 2018 15:52:26 -0400	[thread overview]
Message-ID: <20180716195226.GB25189@sigill.intra.peff.net> (raw)
In-Reply-To: <20180716191505.857-1-newren@gmail.com>

On Mon, Jul 16, 2018 at 12:15:05PM -0700, Elijah Newren wrote:

> The basic problem here, at least for us, is that gc has enough
> information to know it could expunge some objects, but because of how
> it is structured in terms of several substeps (reflog expiration,
> repack, prune), the information is lost between the steps and it
> instead writes them out as unreachable objects.  If we could prune (or
> avoid exploding) loose objects that are only reachable from reflog
> entries that we are expiring, then the problem goes away for us.  (I
> totally understand that other repos may have enough unreachable
> objects for other reasons that Peff's suggestion to just pack up
> unreachable objects is still a really good idea.  But on its own, it
> seems like a waste since it's packing stuff that we know we could just
> expunge.)

No, we should have expunged everything that could be during the "repack"
and "prune" steps. We feed the expiration time to repack, so that it
knows to drop objects entirely instead of exploding them loose.

So the leftovers really are objects that cannot be deleted yet. You
could literally just do:

  find .git/objects/?? -type f |
  perl -lne 's{../.{38}$} and print "$1$2"' |
  git pack-objects .git/objects/pack/cruft-pack

But:

  - that will explode them out only to repack them, which is inefficient
    (if they're already packed, you can probably reuse deltas, not to
    mention the I/O savings)

  - there's the question of how to handle timestamps. Some of those
    objects may have been _about_ to expire, but now you've just put
    them in a brand-new pack that adds another 2 weeks to their life

  - the find above is sloppy, and will race with somebody adding new
    objects to the repo

So probably you want to have pack-objects write out the list of objects
it _would_ explode, rather than exploding them. And then before
git-repack deletes the old packs, put those into a new cruft pack. That
_just_ leaves the timestamp issue (which is discussed at length in the
thread I linked earlier).

> git_actual_garbage_collect() {
>     GITDIR=$(git rev-parse --git-dir)
> 
>     # Record all revisions stored in reflog before and after gc
>     git rev-list --no-walk --reflog >$GITDIR/gc.original-refs
>     git gc --auto
>     wait_for_background_gc_to_finish
>     git rev-list --no-walk --reflog >$GITDIR/gc.final-refs
> 
>     # Find out which reflog entries were removed
>     DELETED_REFS=$(comm -23 <(sort $GITDIR/gc.original-refs) <(sort $GITDIR/gc.final-refs))

This is too detailed, I think. There are other reasons to have
unreachable objects than expired reflogs. I think you really just want
to consider all unreachable objects (like the pack-objects thing I
mentioned above).

-Peff

  parent reply	other threads:[~2018-07-16 19:52 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-16 17:27 [PATCH] gc: do not warn about too many loose objects Jonathan Tan
2018-07-16 17:51 ` Jeff King
2018-07-16 18:22   ` Jonathan Nieder
2018-07-16 18:52     ` Jeff King
2018-07-16 19:09       ` Jonathan Nieder
2018-07-16 19:41         ` Jeff King
2018-07-16 19:54           ` Jonathan Nieder
2018-07-16 20:29             ` Jeff King
2018-07-16 20:37               ` Jonathan Nieder
2018-07-16 21:09                 ` Jeff King
2018-07-16 21:40                   ` Jonathan Nieder
2018-07-16 21:45                     ` Jeff King
2018-07-16 22:03                       ` Jonathan Nieder
2018-07-16 22:43                         ` Jeff King
2018-07-16 22:56                           ` Jonathan Nieder
2018-07-16 23:26                             ` Jeff King
2018-07-17  1:53                               ` Jonathan Nieder
2018-07-17  8:59                                 ` Ævar Arnfjörð Bjarmason
2018-07-17 14:03                                   ` Jonathan Nieder
2018-07-17 15:24                                     ` Ævar Arnfjörð Bjarmason
2018-07-17 20:27                                   ` Jeff King
2018-07-18 13:11                                     ` Ævar Arnfjörð Bjarmason
2018-07-18 17:29                                       ` Jeff King
2018-07-17 15:59                                 ` Duy Nguyen
2018-07-17 18:09                                 ` Junio C Hamano
2018-07-16 19:15 ` Elijah Newren
2018-07-16 19:19   ` Jonathan Nieder
2018-07-16 20:21     ` Elijah Newren
2018-07-16 20:35       ` Jeff King
2018-07-16 20:56         ` Jonathan Nieder
2018-07-16 21:12           ` Jeff King
2018-07-16 19:52   ` Jeff King [this message]
2018-07-16 20:16     ` Elijah Newren
2018-07-16 20:38       ` Jeff King
2018-07-16 21:09         ` Elijah Newren
2018-07-16 21:21           ` Jeff King
2018-07-16 22:07             ` Elijah Newren
2018-07-16 22:55               ` Jeff King
2018-07-16 23:06                 ` Elijah Newren
2018-07-16 21:31           ` Jonathan Nieder
2018-07-17  6:51 ` [PATCH v2 0/3] gc --auto: do not return error for prior errors in daemonized mode Jonathan Nieder
2018-07-17  6:53   ` [PATCH 1/3] gc: improve handling of errors reading gc.log Jonathan Nieder
2018-07-17 18:19     ` Junio C Hamano
2018-07-17 19:58     ` Jeff King
2018-07-17  6:54   ` [PATCH 2/3] gc: exit with status 128 on failure Jonathan Nieder
2018-07-17 18:22     ` Junio C Hamano
2018-07-17 19:59     ` Jeff King
2018-09-17 18:33       ` Jeff King
2018-09-17 18:40         ` Jonathan Nieder
2018-09-18 17:30           ` Jeff King
2018-07-17  6:57   ` [PATCH 3/3] gc: do not return error for prior errors in daemonized mode Jonathan Nieder
2018-07-17 20:13     ` Jeff King
2018-07-18 16:21       ` Junio C Hamano
2018-07-18 17:22         ` Jeff King
2018-07-18 18:19           ` Junio C Hamano
2018-07-18 19:06             ` Jeff King
2018-07-18 19:55               ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180716195226.GB25189@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=jrnieder@gmail.com \
    --cc=newren@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).