From: Jeff King <peff@peff.net>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: Ulrich Windl <Ulrich.Windl@rz.uni-regensburg.de>, git@vger.kernel.org
Subject: Re: non-smooth progress indication for git fsck and git gc
Date: Sun, 2 Sep 2018 03:46:57 -0400 [thread overview]
Message-ID: <20180902074656.GB18787@sigill.intra.peff.net> (raw)
In-Reply-To: <87y3clbcqf.fsf@evledraar.gmail.com>
On Sat, Sep 01, 2018 at 02:53:28PM +0200, Ævar Arnfjörð Bjarmason wrote:
> With this we'll get output like:
>
> $ ~/g/git/git -C ~/g/2015-04-03-1M-git/ --exec-path=$PWD fsck
> Checking object directories: 100% (256/256), done.
> Hashing: 100% (452634108/452634108), done.
> Hashing: 100% (1073741824/1073741824), done.
> Hashing: 100% (1073741824/1073741824), done.
> Hashing: 100% (1008001572/1008001572), done.
> Checking objects: 2% (262144/13064614)
> ^C
>
> All tests pass with this. Isn't it awesome? Except it's of course a
> massive hack, we wouldn't want to just hook into SHA1DC like this.
I still consider that output so-so; the byte counts are big and there's
no indication how many "hashing" lines we're going to see. It's also
broken up in a weird way (it's not one per file; it's one per giant
chunk we fed to sha1).
> The problem comes down to us needing to call git_hash_sha1_update() with
> some really huge input, that function is going to take a *long* time,
> and the only way we're getting incremental progress is:
>
> 1) If we ourselves split the input into N chunks
> 2) If we hack into the SHA1 library itself
>
> This patch does #2, but for this to be acceptable we'd need to do
> something like #1.
I think we could just do the chunking in verify_packfile(), couldn't we?
(And the .idx hash, if we really want to cover that case, but IMHO
that's way less interesting).
Something like this, which chunks it there, uses a per-packfile meter
(though still does not give any clue how many packfiles there are), and
shows a throughput meter.
diff --git a/pack-check.c b/pack-check.c
index d3a57df34f..c94223664f 100644
--- a/pack-check.c
+++ b/pack-check.c
@@ -62,10 +62,25 @@ static int verify_packfile(struct packed_git *p,
uint32_t nr_objects, i;
int err = 0;
struct idx_entry *entries;
+ struct progress *hashing_progress;
+ char *title;
+ off_t total_hashed = 0;
if (!is_pack_valid(p))
return error("packfile %s cannot be accessed", p->pack_name);
+ if (progress) {
+ /* Probably too long... */
+ title = xstrfmt("Hashing %s", p->pack_name);
+
+ /*
+ * I don't think it actually works to have two progresses going
+ * at the same time, because when the first one ends, we'll
+ * cancel the alarm. But hey, this is a hacky proof of concept.
+ */
+ hashing_progress = start_progress(title, 0);
+ }
+
the_hash_algo->init_fn(&ctx);
do {
unsigned long remaining;
@@ -75,9 +90,25 @@ static int verify_packfile(struct packed_git *p,
pack_sig_ofs = p->pack_size - the_hash_algo->rawsz;
if (offset > pack_sig_ofs)
remaining -= (unsigned int)(offset - pack_sig_ofs);
- the_hash_algo->update_fn(&ctx, in, remaining);
+ while (remaining) {
+ int chunk = remaining < 4096 ? remaining : 4096;
+ the_hash_algo->update_fn(&ctx, in, chunk);
+ in += chunk;
+ remaining -= chunk;
+ total_hashed += chunk;
+ /*
+ * The progress code needs tweaking to show throughputs
+ * better for open-ended meters.
+ */
+ display_throughput(hashing_progress, total_hashed);
+ display_progress(hashing_progress, 0);
+ }
} while (offset < pack_sig_ofs);
+
the_hash_algo->final_fn(hash, &ctx);
+ stop_progress(&hashing_progress);
+ free(title);
+
pack_sig = use_pack(p, w_curs, pack_sig_ofs, NULL);
if (hashcmp(hash, pack_sig))
err = error("%s pack checksum mismatch",
next prev parent reply other threads:[~2018-09-02 7:47 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-08-16 6:54 non-smooth progress indication for git fsck and git gc Ulrich Windl
2018-08-16 15:18 ` Duy Nguyen
2018-08-16 16:05 ` Jeff King
2018-08-20 8:27 ` Antw: " Ulrich Windl
2018-08-16 15:57 ` Jeff King
2018-08-16 20:02 ` Jeff King
2018-08-16 22:10 ` Junio C Hamano
2018-08-16 20:35 ` Ævar Arnfjörð Bjarmason
2018-08-16 20:55 ` Jeff King
2018-08-16 21:06 ` Jeff King
2018-08-17 14:39 ` Duy Nguyen
2018-08-20 8:33 ` Antw: " Ulrich Windl
2018-08-20 8:57 ` Ævar Arnfjörð Bjarmason
2018-08-20 9:37 ` Ulrich Windl
2018-08-21 1:07 ` Jeff King
2018-08-21 6:20 ` Ulrich Windl
2018-08-21 15:21 ` Duy Nguyen
2018-09-01 12:53 ` Ævar Arnfjörð Bjarmason
2018-09-01 13:52 ` Ævar Arnfjörð Bjarmason
2018-09-02 7:46 ` Jeff King [this message]
2018-09-02 7:55 ` Jeff King
2018-09-02 8:55 ` Jeff King
2018-09-03 16:48 ` Ævar Arnfjörð Bjarmason
2018-09-07 3:30 ` Jeff King
2018-09-04 15:53 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180902074656.GB18787@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=Ulrich.Windl@rz.uni-regensburg.de \
--cc=avarab@gmail.com \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).