git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Duy Nguyen <pclouds@gmail.com>
To: Elijah Newren <newren@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>, Jeff King <peff@peff.net>
Subject: Re: [PATCH] pack-objects: fix performance issues on packing large deltas
Date: Sat, 21 Jul 2018 06:07:26 +0200	[thread overview]
Message-ID: <CACsJy8Ck7U3m8khFdiKDzYWKk1ZcKyi32RkQ=a=oFQYHd5bruQ@mail.gmail.com> (raw)
In-Reply-To: <CABPp-BF5O2-DQMSjN67HYsHHYHP_VH-N-C=k796NwPTvtwf7gQ@mail.gmail.com>

On Sat, Jul 21, 2018 at 1:52 AM Elijah Newren <newren@gmail.com> wrote:
>
> On Fri, Jul 20, 2018 at 10:43 AM, Elijah Newren <newren@gmail.com> wrote:
> > On Fri, Jul 20, 2018 at 8:39 AM, Nguyễn Thái Ngọc Duy <pclouds@gmail.com> wrote:
>
> >> This patch provides a better fallback that is
> >>
> >> - cheaper in terms of cpu and io because we won't have to read
> >>   existing pack files as much
> >>
> >> - better in terms of pack size because the pack heuristics is back to
> >>   2.17.0 time, we do not drop large deltas at all
> >>
> >> If we encounter any delta (on-disk or created during try_delta phase)
> >> that is larger than the 2MB limit, we stop using delta_size_ field for
> >> this because it can't contain such size anyway. A new array of delta
> >> size is dynamically allocated and can hold all the deltas that 2.17.0
> >> can [1]. All current delta sizes are migrated over.
> >>
> >> With this, we do not have to drop deltas in try_delta() anymore. Of
> >> course the downside is we use slightly more memory, even compared to
> >> 2.17.0. But since this is considered an uncommon case, a bit more
> >> memory consumption should not be a problem.
>
> Is the increased memory only supposed to affect the uncommon case?  I
> was able to verify that 2.18.0 used less memory than 2.17.0 (for
> either my repo or linux.git), but I don't see nearly the same memory
> reduction relative to 2.17.0 for linux.git with your new patches.
>
> >> ---
> >>  I'm optimistic that the slowness on linux repo is lock contention
> >>  (because if it's not then I have no clue what is). So let's start
> >>  doing proper patches.
> >
> > I'll be testing this shortly...
>
> Using these definitions for versions:
>
>   fix-v5: https://public-inbox.org/git/20180720052829.GA3852@sigill.intra.peff.net/
> (plus what it depends on)
>   fix-v6: The patch I'm responding to
>   fix-v2: https://public-inbox.org/git/20180718225110.17639-3-newren@gmail.com/
>
> and inspired by
> https://public-inbox.org/git/87po42cwql.fsf@evledraar.gmail.com/, I
> ran
>
>    /usr/bin/time -f 'MaxRSS:%M Time:%e' git gc --aggressive
>
> three times on a beefy box (40 cores, 160GB ram, otherwise quiet
> system).  If I take the lowest MaxRSS, and the lowest time reported
> (regardless of whether from the same run), then the results are as
> follows for linux.git (all cases repacked down to 1279 MB, so removing
> that from the table):
>
> Version  MaxRSS(kB)  Time (s)
> -------  ----------  --------
>  2.17.0   11413236    621.36
>  2.18.0   10987152    621.80
>  fix-v5   11105564    836.29
>  fix-v6   11357304    831.73
>  fix-v2   10873348    619.96
>
> The runtime could swing up to 10 seconds between runs.  The memory use
> also had some variability, but all runs of fix-v2 had lower MaxRSS
> than any runs of 2.18.0 (which surprised me); all runs of 2.18.0 used
> less memory than any run of fix-v6, and all runs of fix-v6 used less
> memory than any run of 2.17.0.  fix-v5 had the most variabiilty in
> MaxRSS, ranging from as low as some 2.18.0 runs up to higher than any
> 2.17.0 runs.

As Peff said, RSS is not a very good way to measure memory usage.
valgrind --tool=massif would be the best, or just grep "heap" in
/proc/$PID/smaps. fix-v2 should increase struct object_entry's size
and memory usage for all repos, while v6 has the same memory usage as
2.17.0 in heap, and extra once it hits a big delta.

> ...but maybe that'd change with more than 3 runs of each?
>
> Anyway, this is a lot better than the 1193.67 seconds I saw with
> fix-v4 (https://public-inbox.org/git/20180719182442.GA5796@duynguyen.home/,
> which Peff fixed up into fix-v5), suggesting it is lock contention,
> but we're still about 33% slower than 2.17.0 and we've lost almost all
> of the memory savings.  Somewhat surprisingly, the highly simplistic
> fix-v2 does a lot better on both measures.

Maybe we should avoid locking when the delta is small enough then and
see how it goes. This is linux.git so there's no big delta and we
would end up not locking at all.

If you do "git repack -adf --threads=8", does the time difference
between v6 and 2.17.0 reduce (suggesting we still hit lock contention
hard)?

> However, I'm just concentrating on a beefy machine; it may be that v6
> drastically outperforms v2 on weaker hardware?  Can others measure a
> lower memory usage for v6 than v2?

I'll try it with massif on linux.git, but this is a very weak laptop
(4GB) so  it'll take a while....

> Also, for the original repo with the huge deltas that started it all,
> the numbers are (only run once since it takes so long):
>
> Version  Pack (MB)  MaxRSS(kB)  Time (s)
> -------  ---------  ----------  --------
>  2.17.0     5498     43513628    2494.85
>  2.18.0    10531     40449596    4168.94
>  fix-v5     5500     42805716    2577.50
>  fiv-v6     5509     42814836    2605.36
>  fix-v2     5509     41644104    2468.25
>
>
> >>  If we want a quick fix for 2.18.1. I suggest bumping up
> >>  OE_DELTA_SIZE_BITS like Elijah did in his second option. I don't
> >>  think it's safe to fast track this patch.
> >
> > Okay, I'll submit that with a proper commit message then.  Since you
> > also bump OE_DELTA_SIZE_BITS in your patch (though to a different
> > value), it'll require a conflict resolution when merging maint into
> > master, but at least the change won't silently leak through.
>
> ...except now I'm wondering if the commit message should mention that
> it's just a stopgap fix for 2.18.1, or whether it's actually the fix
> that we just want to use going forward as well.

no fix-v2 is really a stopgap. With 2.17.0 (and fix-v6) we pay 80
bytes per object, or 6.5M * 80 = 520 MB heap. fix-v2 increases this
struct to 88 bytes so we pay 52 MB extra.
-- 
Duy

  reply	other threads:[~2018-07-21  4:08 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-18 22:51 2.18.0 Regression: packing performance and effectiveness Elijah Newren
2018-07-18 22:51 ` [RFC PATCH] fix-v1: revert "pack-objects: shrink delta_size field in struct object_entry" Elijah Newren
2018-07-18 22:51 ` [RFC PATCH] fix-v2: make OE_DELTA_SIZE_BITS a bit bigger Elijah Newren
2018-07-19  5:41 ` 2.18.0 Regression: packing performance and effectiveness Duy Nguyen
2018-07-19  5:49   ` Jeff King
2018-07-19 15:27   ` Elijah Newren
2018-07-19 15:43     ` Duy Nguyen
2018-07-19  5:44 ` Jeff King
2018-07-19  5:57   ` Duy Nguyen
2018-07-19 15:16     ` Duy Nguyen
2018-07-19 16:42       ` Elijah Newren
2018-07-19 17:23         ` Jeff King
2018-07-19 17:31           ` Duy Nguyen
2018-07-19 18:24             ` Duy Nguyen
2018-07-19 19:17               ` Jeff King
2018-07-19 23:11               ` Elijah Newren
2018-07-20  5:28                 ` Jeff King
2018-07-20  5:30                   ` Jeff King
2018-07-20  5:47                   ` Duy Nguyen
2018-07-20 17:21                   ` Elijah Newren
2018-07-19 17:04       ` Jeff King
2018-07-19 19:25       ` Junio C Hamano
2018-07-19 19:27         ` Junio C Hamano
2018-07-20 15:39 ` [PATCH] pack-objects: fix performance issues on packing large deltas Nguyễn Thái Ngọc Duy
2018-07-20 17:40   ` Jeff King
2018-07-21  4:23     ` Duy Nguyen
2018-07-23 21:37       ` Jeff King
2018-07-20 17:43   ` Elijah Newren
2018-07-20 23:52     ` Elijah Newren
2018-07-21  4:07       ` Duy Nguyen [this message]
2018-07-21  7:08         ` Duy Nguyen
2018-07-21  4:47     ` Duy Nguyen
2018-07-21  6:56       ` Elijah Newren
2018-07-21  7:14         ` Duy Nguyen
2018-07-22  6:22       ` Elijah Newren
2018-07-22  6:49         ` Duy Nguyen
2018-07-23 12:34     ` Elijah Newren
2018-07-23 15:50       ` Duy Nguyen
2018-07-22  8:04   ` [PATCH v2] " Nguyễn Thái Ngọc Duy
2018-07-23 18:04     ` Junio C Hamano
2018-07-23 18:38       ` Duy Nguyen
2018-07-23 18:49         ` Duy Nguyen
2018-07-23 21:30           ` Jeff King
2018-07-26  8:12     ` Johannes Sixt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CACsJy8Ck7U3m8khFdiKDzYWKk1ZcKyi32RkQ=a=oFQYHd5bruQ@mail.gmail.com' \
    --to=pclouds@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).