git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Elijah Newren <newren@gmail.com>
To: Duy Nguyen <pclouds@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: 2.18.0 Regression: packing performance and effectiveness
Date: Thu, 19 Jul 2018 08:27:38 -0700	[thread overview]
Message-ID: <CABPp-BGbFSv3uyHusgKqaFAW4acXdf2Cj-s4FyZLiDU_brpK1Q@mail.gmail.com> (raw)
In-Reply-To: <CACsJy8Cc=h3ptDaW=oUGXqRZAvNU8-pUCe1cVFpRd0t8O1a0Gw@mail.gmail.com>

On Wed, Jul 18, 2018 at 10:41 PM, Duy Nguyen <pclouds@gmail.com> wrote:
> On Thu, Jul 19, 2018 at 12:51 AM Elijah Newren <newren@gmail.com> wrote:
>>
>> I had a user report some poor behavior of 'git gc --aggressive' on a
>> certain repo (which I sadly cannot share).  Turns out that on this
>> repo, this operation takes about 60% longer and produces a pack
>> roughly twice the expected size.
>
> The intention was to make life better for weaker machines but
> definitely should not slow down beefier ones, so yes this is
> definitely a regression.

Not sure if it matters, but the original discovery was on a laptop.
Probably 4 cores and 16 GB RAM.  I duplicated on my workstation (8
cores, 24 GB RAM).  However, since the original patch was memory
related and I noticed the repacking using all available memory, I
repeated the testcases while measuring memory but did it on a machine
that wouldn't be memory limited.

> Is it possible to share "verify-pack -v <pack file>" output of the
> pack produced by 2.17.0 and 2.18.0? The only sensitive info there is
> sha-1, which you can replace with just "SHA-1" if you want. I'm more
> interested in delta sizes and distribution.

For the deltas, assuming the size-in-pack field (4th column) is the relevant one

Number of objects in repo (lines of verify-pack output): 4460755
Number of deltas: 2831384
Number of deltas greater than 1MB: 72
Min: 14
Median: 47
Mean: 586
99.999 percentile: 11366280.6 (10.8 MB)
Max: 141664210 (135.1 MB)

If the size field (3rd column) is the relevant one, then the numbers
change slightly to:

Number of deltas greater than 1MB: 101
Min: 4
Median: 33
Mean: 660
99.999 percentile: 12245551.7 (11.7 MB)
Max: 144658342 (138.0 MB)

I checked out the blob which had the biggest delta, as well as the
blob it was a delta against.  One was a 280 MB file, the other a 278
MB file.

  parent reply	other threads:[~2018-07-19 15:27 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-18 22:51 2.18.0 Regression: packing performance and effectiveness Elijah Newren
2018-07-18 22:51 ` [RFC PATCH] fix-v1: revert "pack-objects: shrink delta_size field in struct object_entry" Elijah Newren
2018-07-18 22:51 ` [RFC PATCH] fix-v2: make OE_DELTA_SIZE_BITS a bit bigger Elijah Newren
2018-07-19  5:41 ` 2.18.0 Regression: packing performance and effectiveness Duy Nguyen
2018-07-19  5:49   ` Jeff King
2018-07-19 15:27   ` Elijah Newren [this message]
2018-07-19 15:43     ` Duy Nguyen
2018-07-19  5:44 ` Jeff King
2018-07-19  5:57   ` Duy Nguyen
2018-07-19 15:16     ` Duy Nguyen
2018-07-19 16:42       ` Elijah Newren
2018-07-19 17:23         ` Jeff King
2018-07-19 17:31           ` Duy Nguyen
2018-07-19 18:24             ` Duy Nguyen
2018-07-19 19:17               ` Jeff King
2018-07-19 23:11               ` Elijah Newren
2018-07-20  5:28                 ` Jeff King
2018-07-20  5:30                   ` Jeff King
2018-07-20  5:47                   ` Duy Nguyen
2018-07-20 17:21                   ` Elijah Newren
2018-07-19 17:04       ` Jeff King
2018-07-19 19:25       ` Junio C Hamano
2018-07-19 19:27         ` Junio C Hamano
2018-07-20 15:39 ` [PATCH] pack-objects: fix performance issues on packing large deltas Nguyễn Thái Ngọc Duy
2018-07-20 17:40   ` Jeff King
2018-07-21  4:23     ` Duy Nguyen
2018-07-23 21:37       ` Jeff King
2018-07-20 17:43   ` Elijah Newren
2018-07-20 23:52     ` Elijah Newren
2018-07-21  4:07       ` Duy Nguyen
2018-07-21  7:08         ` Duy Nguyen
2018-07-21  4:47     ` Duy Nguyen
2018-07-21  6:56       ` Elijah Newren
2018-07-21  7:14         ` Duy Nguyen
2018-07-22  6:22       ` Elijah Newren
2018-07-22  6:49         ` Duy Nguyen
2018-07-23 12:34     ` Elijah Newren
2018-07-23 15:50       ` Duy Nguyen
2018-07-22  8:04   ` [PATCH v2] " Nguyễn Thái Ngọc Duy
2018-07-23 18:04     ` Junio C Hamano
2018-07-23 18:38       ` Duy Nguyen
2018-07-23 18:49         ` Duy Nguyen
2018-07-23 21:30           ` Jeff King
2018-07-26  8:12     ` Johannes Sixt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CABPp-BGbFSv3uyHusgKqaFAW4acXdf2Cj-s4FyZLiDU_brpK1Q@mail.gmail.com \
    --to=newren@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=pclouds@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).