git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Cc: Michael Haggerty <mhagger@alum.mit.edu>,
	Junio C Hamano <gitster@pobox.com>
Subject: [PATCH v2 0/7] speed up pack-objects counting with many packs
Date: Fri, 29 Jul 2016 00:04:23 -0400	[thread overview]
Message-ID: <20160729040422.GA19678@sigill.intra.peff.net> (raw)

This is a follow-up to the patches in

  http://public-inbox.org/git/20160725184938.GA12871@sigill.intra.peff.net/

that are currently queued in jk/pack-objects-optim-skimming. Roughly,
they try to optimize a loop that is O(nr_objects * nr_packs) by breaking
out early in some cases.

I had written those patches a while ago and confirmed that they did
speed up a particular nasty case I had. But when I tried to write a
t/perf test to show off the improvement, I found that they didn't help!
The reason is that the optimizations are heavily dependent on the order
of the packs, and which objects go in which pack. The loop has the same
worst-case complexity as it always did, but we rely on getting lucky to
break out early.

I think the perf test I've included here is more representative of a
real-world workloads, and with an extra optimization, I was able to show
good numbers with it.

The general strategy is to order the pack lookups in most-recently-used
order. This replaces an existing 1-element MRU cache in the normal pack
lookup code, and replaces a straight reverse-chronological iteration in
pack-objects.

All credit for thinking of this scheme goes to Michael Haggerty, who
suggested the idea to me about six months ago. It seemed like a lot of
work at the time, so I didn't do it. :) But as I started to implement
the same 1-element cache in pack-objects, I found that the code actually
gets rather awkward. The MRU solution makes the callers easier to read,
and of course it turns out to be faster, to boot.

Anyway, enough chit-chat. The patches are:

  [1/7]: t/perf: add tests for many-pack scenarios
  [2/7]: sha1_file: drop free_pack_by_name
  [3/7]: add generic most-recently-used list
  [4/7]: find_pack_entry: replace last_found_pack with MRU cache
  [5/7]: pack-objects: break out of want_object loop early
  [6/7]: pack-objects: compute local/ignore_pack_keep early
  [7/7]: pack-objects: use mru list when iterating over packs

The actual optimizations are in patches 4 and 7, which have their own
numbers. But here are end-to-end numbers for the series against the tip
of master (for the meanings, see the discussion in patch 1, and the
analysis in 4 and 7):

[p5303, linux.git]
Test                      origin                HEAD
-------------------------------------------------------------------------
5303.3: rev-list (1)      31.48(31.20+0.27)     31.18(30.95+0.22) -1.0%
5303.4: repack (1)        40.74(39.27+2.56)     40.30(38.96+2.47) -1.1%
5303.6: rev-list (50)     31.65(31.38+0.26)     31.26(31.02+0.23) -1.2%
5303.7: repack (50)       60.90(71.03+2.13)     46.95(57.46+1.85) -22.9%
5303.9: rev-list (1000)   38.63(38.25+0.37)     31.91(31.61+0.28) -17.4%
5303.10: repack (1000)    392.52(467.09+5.05)   87.38(159.98+2.92) -77.7%

[p5303, git.git]
Test                      origin              HEAD
---------------------------------------------------------------------
5303.3: rev-list (1)      1.55(1.54+0.00)     1.56(1.54+0.01) +0.6%
5303.4: repack (1)        1.83(1.82+0.06)     1.82(1.82+0.05) -0.5%
5303.6: rev-list (50)     1.58(1.56+0.02)     1.58(1.57+0.00) +0.0%
5303.7: repack (50)       2.50(3.16+0.04)     2.32(2.92+0.09) -7.2%
5303.9: rev-list (1000)   2.64(2.61+0.02)     2.23(2.21+0.01) -15.5%
5303.10: repack (1000)    12.68(19.07+0.30)   7.51(13.86+0.20) -40.8%

For curiosity, I also ran the git.git case with 10,000 packs. This is
even more silly, but it shows that the problem does get worse and worse
as the number grows, but that the patches do continue to help:

Test                        origin               HEAD
-------------------------------------------------------------------------
5303.12: rev-list (10,000)  26.00(25.86+0.13)    15.76(15.62+0.13) -39.4%
5303.13: repack (10,000)   164.11(175.30+1.34)   51.48(62.96+1.18) -68.6%

-Peff

             reply	other threads:[~2016-07-29  4:04 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-29  4:04 Jeff King [this message]
2016-07-29  4:06 ` [PATCH v2 1/7] t/perf: add tests for many-pack scenarios Jeff King
2016-07-29  4:06 ` [PATCH v2 2/7] sha1_file: drop free_pack_by_name Jeff King
2016-07-29  4:06 ` [PATCH v2 3/7] add generic most-recently-used list Jeff King
2016-07-29  4:09 ` [PATCH v2 4/7] find_pack_entry: replace last_found_pack with MRU cache Jeff King
2016-07-29  4:10 ` [PATCH v2 5/7] pack-objects: break out of want_object loop early Jeff King
2016-07-29  4:11 ` [PATCH v2 6/7] pack-objects: compute local/ignore_pack_keep early Jeff King
2016-07-29  4:15 ` [PATCH v2 7/7] pack-objects: use mru list when iterating over packs Jeff King
2016-07-29  5:45   ` Jeff King
2016-07-29 15:02     ` Junio C Hamano
2016-08-08 14:50       ` Jeff King
2016-08-08 16:28         ` Junio C Hamano
2016-08-08 16:51           ` Jeff King
2016-08-08 17:16             ` Junio C Hamano
2016-08-09 14:04               ` Jeff King
2016-08-09 17:45                 ` Jeff King
2016-08-09 18:06                   ` Junio C Hamano
2016-08-09 22:29                 ` Junio C Hamano
2016-08-10 11:52                   ` [PATCH v3 0/2] pack-objects mru Jeff King
2016-08-10 12:02                     ` [PATCH v3 1/2] pack-objects: break delta cycles before delta-search phase Jeff King
2016-08-10 20:17                       ` Junio C Hamano
2016-08-11  5:02                         ` Jeff King
2016-08-11  5:15                           ` [PATCH v4 " Jeff King
2016-08-11  6:57                           ` [PATCH v3 " Jeff King
2016-08-11  9:20                             ` [PATCH v5] pack-objects mru Jeff King
2016-08-11  9:24                               ` [PATCH v5 1/4] provide an initializer for "struct object_info" Jeff King
2016-08-11  9:25                               ` [PATCH v5 2/4] sha1_file: make packed_object_info public Jeff King
2016-08-11  9:26                               ` [PATCH v5 3/4] pack-objects: break delta cycles before delta-search phase Jeff King
2016-08-11  9:26                               ` [PATCH v5 4/4] pack-objects: use mru list when iterating over packs Jeff King
2016-08-11  9:57                               ` [PATCH v5] pack-objects mru Jeff King
2016-08-11 15:11                                 ` Junio C Hamano
2016-08-11 16:19                                   ` Jeff King
2016-08-10 12:03                     ` [PATCH v3 2/2] pack-objects: use mru list when iterating over packs Jeff King
2016-08-10 16:47                     ` [PATCH v3 0/2] pack-objects mru Junio C Hamano
2016-08-11  4:48                       ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160729040422.GA19678@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mhagger@alum.mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).