git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: git@vger.kernel.org, steadmon@google.com
Subject: Re: [PATCH 0/7] Better threaded delta resolution in index-pack (another try)
Date: Tue, 25 Aug 2020 17:18:36 -0400	[thread overview]
Message-ID: <20200825211836.GA1448402@coredump.intra.peff.net> (raw)
In-Reply-To: <20200825181145.1091378-1-jonathantanmy@google.com>

On Tue, Aug 25, 2020 at 11:11:45AM -0700, Jonathan Tan wrote:

> > There may be other cases that get better, though. A 3% increase here is
> > probably OK if we get something for it. But if our primary goal here is
> > increasing multithread efficiency, then we should be able to show some
> > benchmark that improves. :)
> 
> Ah...good question. Cloning from
> https://fuchsia.googlesource.com/third_party/vulkan-cts (mentioned in
> patch 7), cd-ing to the pack dir, and running:
> 
>   git index-pack --stdin -o foo <*.pack
> 
> I got 8m2.878s with my patches and 12m6.365s without. But I ran this on
> a cloud virtual machine (what I have access to right now) so the numbers
> might look different on a dedicated machine.

Thanks, that's a much more interesting example. Here's what I get on my
8-core machine:

  5302.9: index-pack default number of threads   167.70(546.19+12.00)   83.69(585.61+6.95) -50.1%

So that's a considerable improvement. And hardly surprising given the
repository structure. I used the script below to show the size of the
delta families, and the vk-master ones really dominate in size and
object number (the biggest is 50GB in one delta family).

I also ran my PERF_EXTRA tests on them to see if it behaved differently
as the threads increased. Nope:

  5302.3: index-pack 0 threads                   434.13(425.90+8.16)
  5302.4: index-pack 1 threads                   428.65(421.82+6.77)
  5302.5: index-pack 2 threads                   224.05(424.13+6.21)
  5302.6: index-pack 4 threads                   125.43(457.68+5.77)
  5302.7: index-pack 8 threads                   82.60(579.10+7.78) 
  5302.8: index-pack 16 threads                  82.89(1147.82+9.66)
  5302.9: index-pack default number of threads   83.91(576.92+8.52) 

Still maxes out at the number of physical cores (not unexpected, but
that was the thing I was most curious about ;) ). I may run it on the
40-core machine, too. It's possible that with the new threading we're
able to do better going past 20-threads. I doubt it, because I think
it's mostly a function of Git's locking granularity, but worth checking.

-Peff

-- >8 --
#!/bin/sh
# script to output size, count, and filenames for each delta family

git rev-list --objects --all |
git cat-file --buffer \
  --batch-check='%(objectname) %(deltabase) %(objectsize) %(rest)' |
perl -alne '
  if ($F[1] =~ /[^0]/) {
    push @{$children{$F[1]}}, $F[0];
  } else {
    push @bases, $F[0];
  }
  $size{$F[0]} = $F[2];
  $name{$F[0]} = $F[3];
  END {
    sub add_to_component {
      my ($oid, $data) = @_;
      $data->{names}->{$name{$oid}}++;
      $data->{size} += $size{$oid};
      $data->{nr}++;
      add_to_component($_, $data) for @{$children{$oid}};
    }
    for my $b (@bases) {
      my $data = { size => 0, nr => 0, names => {} };
      add_to_component($b, $data);
      print join(" ",
                 $data->{size}, $data->{nr},
		 sort keys(%{$data->{names}})
            ), "\n";
    }
  }
' |
sort -rn

  reply	other threads:[~2020-08-25 21:18 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-24 19:16 [PATCH 0/7] Better threaded delta resolution in index-pack (another try) Jonathan Tan
2020-08-24 19:16 ` [PATCH 1/7] Documentation: deltaBaseCacheLimit is per-thread Jonathan Tan
2020-08-24 19:16 ` [PATCH] fetch-pack: in partial clone, pass --promisor Jonathan Tan
2020-08-24 19:36   ` Jonathan Tan
2020-08-24 19:16 ` [PATCH 2/7] index-pack: remove redundant parameter Jonathan Tan
2020-08-24 21:01   ` Junio C Hamano
2020-08-24 19:16 ` [PATCH 3/7] index-pack: unify threaded and unthreaded code Jonathan Tan
2020-08-24 21:11   ` Junio C Hamano
2020-08-24 19:16 ` [PATCH 4/7] index-pack: remove redundant child field Jonathan Tan
2020-08-24 19:16 ` [PATCH 5/7] index-pack: calculate {ref,ofs}_{first,last} early Jonathan Tan
2020-08-24 19:16 ` [PATCH 6/7] index-pack: make resolve_delta() assume base data Jonathan Tan
2020-08-24 19:16 ` [PATCH 7/7] index-pack: make quantum of work smaller Jonathan Tan
2020-08-24 21:19   ` Junio C Hamano
2020-08-24 20:47 ` [PATCH 0/7] Better threaded delta resolution in index-pack (another try) Junio C Hamano
2020-08-24 21:27 ` [PATCH] fixup! index-pack: make quantum of work smaller Jonathan Tan
2020-08-24 22:08 ` [PATCH 0/7] Better threaded delta resolution in index-pack (another try) Jeff King
2020-08-25 18:11   ` Jonathan Tan
2020-08-25 21:18     ` Jeff King [this message]
2020-08-25 21:46       ` Jeff King
2020-09-08 19:48 ` [PATCH v2 " Jonathan Tan
2020-09-08 19:48   ` [PATCH v2 1/7] Documentation: deltaBaseCacheLimit is per-thread Jonathan Tan
2020-09-08 19:48   ` [PATCH v2 2/7] index-pack: remove redundant parameter Jonathan Tan
2020-09-08 19:48   ` [PATCH v2 3/7] index-pack: unify threaded and unthreaded code Jonathan Tan
2020-09-08 19:48   ` [PATCH v2 4/7] index-pack: remove redundant child field Jonathan Tan
2020-09-08 19:48   ` [PATCH v2 5/7] index-pack: calculate {ref,ofs}_{first,last} early Jonathan Tan
2020-09-08 19:48   ` [PATCH v2 6/7] index-pack: make resolve_delta() assume base data Jonathan Tan
2020-09-08 19:48   ` [PATCH v2 7/7] index-pack: make quantum of work smaller Jonathan Tan
2020-09-08 22:53   ` [PATCH v2 0/7] Better threaded delta resolution in index-pack (another try) Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200825211836.GA1448402@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=steadmon@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).