git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: "René Scharfe" <l.s.r@web.de>
Cc: Keegan Carruthers-Smith <keegan.csmith@gmail.com>, git@vger.kernel.org
Subject: Re: git archive generates tar with malformed pax extended attribute
Date: Tue, 4 Jun 2019 16:53:42 -0400	[thread overview]
Message-ID: <20190604205342.GA27861@sigill.intra.peff.net> (raw)
In-Reply-To: <c961d89d-db0b-597f-c183-81aa791c0987@web.de>

On Sun, Jun 02, 2019 at 06:58:48PM +0200, René Scharfe wrote:

> > That sounds about right. It's basically every version of every tree that
> > has a symlink. Did it make a noticeable difference in timing? Indexing
> > the whole kernel history is already a horribly slow process. :)
> 
> Right, I didn't notice a difference -- no patience for watching that
> thing to the end.  But here are some numbers for v2.21.0 vs. master with
> the patch:
> 
> Benchmark #1: git fsck
>   Time (mean ± σ):     307.775 s ±  9.054 s    [User: 307.173 s, System: 0.448 s]
>   Range (min … max):   294.052 s … 322.931 s    10 runs
> 
> Benchmark #2: ~/src/git/git fsck
>   Time (mean ± σ):     319.754 s ±  2.255 s    [User: 318.927 s, System: 0.671 s]
>   Range (min … max):   316.376 s … 323.747 s    10 runs
> 
> Summary
>   'git fsck' ran
>     1.04 ± 0.03 times faster than '~/src/git/git fsck'

I guess that's about what I'd expect. The bulk of the time in most repos
will go to fscking the actual blobs, I'd think. But hitting each tree
twice really is noticeable.

> Seeing only a single CPU core being stressed for that long is a bit sad
> to see.  Checking individual objects should be relatively easy to
> parallelize, shouldn't it?

Yes. The fsck code is pretty old, and uses a very simple way of walking
over all of the packs. index-pack (which backs verify-pack these days)
is much smarter, and runs in parallel. It still takes a lock when doing
the actual fsck checks, but most of the time goes to the zlib inflation
and delta reconstruction.

There's some discussion in:

  https://public-inbox.org/git/20180816210657.GA9291@sigill.intra.peff.net/

and even some patches elsewhere in the thread here:

  https://public-inbox.org/git/20180902075528.GC18787@sigill.intra.peff.net/

and here:

  https://public-inbox.org/git/20180902085503.GA25391@sigill.intra.peff.net/

I think the big show-stopper there is how ugly it is to run the pack
verification in a separate process (and I suspect it is not just ugly
from a code point of view, but actively breaks index-pack because it
then relies on the set of objects seen during the first phase to do its
connectivity check).

So there would probably need to be some lib-ification work on index-pack
first, so that we could call it (at least in verification mode) multiple
times from inside fsck.

-Peff

  reply	other threads:[~2019-06-04 20:53 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-24  6:45 git archive generates tar with malformed pax extended attribute Keegan Carruthers-Smith
2019-05-24  7:06 ` Jeff King
2019-05-24  7:35   ` Keegan Carruthers-Smith
2019-05-24  8:13     ` Jeff King
2019-05-25 13:26       ` René Scharfe
2019-05-25 13:46         ` Andreas Schwab
2019-05-25 21:07         ` Ævar Arnfjörð Bjarmason
2019-05-26 21:33           ` René Scharfe
2019-05-28  5:44             ` Jeff King
2019-05-28  5:58         ` Jeff King
2019-05-28 18:01           ` René Scharfe
2019-05-28 19:08             ` Jeff King
2019-05-28 23:34               ` René Scharfe
2019-05-29  1:17                 ` Jeff King
2019-05-29 17:54                   ` René Scharfe
2019-05-30 11:55                     ` Jeff King
2019-06-02 16:58                       ` René Scharfe
2019-06-04 20:53                         ` Jeff King [this message]
2019-05-27  5:11       ` Keegan Carruthers-Smith
2019-05-25 20:46   ` Ævar Arnfjörð Bjarmason
2019-05-25 21:19     ` brian m. carlson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190604205342.GA27861@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=keegan.csmith@gmail.com \
    --cc=l.s.r@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).