git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: Ulrich Windl <Ulrich.Windl@rz.uni-regensburg.de>, git@vger.kernel.org
Subject: Re: non-smooth progress  indication for git fsck and git gc
Date: Thu, 16 Aug 2018 17:06:57 -0400	[thread overview]
Message-ID: <20180816210657.GA9291@sigill.intra.peff.net> (raw)
In-Reply-To: <20180816205556.GA8257@sigill.intra.peff.net>

On Thu, Aug 16, 2018 at 04:55:56PM -0400, Jeff King wrote:

> >  * We spend the majority of the ~30s on this:
> >    https://github.com/git/git/blob/63749b2dea5d1501ff85bab7b8a7f64911d21dea/pack-check.c#L70-L79
> 
> This is hashing the actual packfile. This is potentially quite long,
> especially if you have a ton of big objects.
> 
> I wonder if we need to do this as a separate step anyway, though. Our
> verification is based on index-pack these days, which means it's going
> to walk over the whole content as part of the "Indexing objects" step to
> expand base objects and mark deltas for later. Could we feed this hash
> as part of that walk over the data? It's not going to save us 30s, but
> it's likely to be more efficient. And it would fold the effort naturally
> into the existing progress meter.

Actually, I take it back. That's the nice, modern way we do it in
git-verify-pack. But git-fsck uses the ancient "just walk over all of
the idx entries method". It at least sorts in pack order, which is good,
but:

  - it's not multi-threaded, like index-pack/verify-pack

  - the index-pack way is actually more efficient than pack-ordering for
    the delta-base cache, because it actually walks the delta-graph in
    the optimal order

Once upon a time verify-pack used this same pack-check code, and we
switched in 3de89c9d42 (verify-pack: use index-pack --verify,
2011-06-03). So I suspect the right thing to do here is for fsck to
switch to that, too, and then delete pack-check.c entirely.

That may well involve us switching the progress to a per-pack view
(e.g., "checking pack 1/10 (10%)", followed by its own progress meter).
But I don't think that would be a bad thing. It's a more realistic view
of the work we're actually doing. Although perhaps it's misleading about
the total work remaining, because not all packs are the same size (so
you see that you're halfway through pack 2/10, but that's quite likely
to be half of the total work if it's the one gigantic pack).

-Peff

  reply	other threads:[~2018-08-16 21:07 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-16  6:54 non-smooth progress indication for git fsck and git gc Ulrich Windl
2018-08-16 15:18 ` Duy Nguyen
2018-08-16 16:05   ` Jeff King
2018-08-20  8:27   ` Antw: " Ulrich Windl
2018-08-16 15:57 ` Jeff King
2018-08-16 20:02   ` Jeff King
2018-08-16 22:10     ` Junio C Hamano
2018-08-16 20:35   ` Ævar Arnfjörð Bjarmason
2018-08-16 20:55     ` Jeff King
2018-08-16 21:06       ` Jeff King [this message]
2018-08-17 14:39         ` Duy Nguyen
2018-08-20  8:33       ` Antw: " Ulrich Windl
2018-08-20  8:57         ` Ævar Arnfjörð Bjarmason
2018-08-20  9:37           ` Ulrich Windl
2018-08-21  1:07           ` Jeff King
2018-08-21  6:20             ` Ulrich Windl
2018-08-21 15:21             ` Duy Nguyen
2018-09-01 12:53     ` Ævar Arnfjörð Bjarmason
2018-09-01 13:52       ` Ævar Arnfjörð Bjarmason
2018-09-02  7:46       ` Jeff King
2018-09-02  7:55         ` Jeff King
2018-09-02  8:55           ` Jeff King
2018-09-03 16:48             ` Ævar Arnfjörð Bjarmason
2018-09-07  3:30               ` Jeff King
2018-09-04 15:53           ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180816210657.GA9291@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=Ulrich.Windl@rz.uni-regensburg.de \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).