git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Duy Nguyen <pclouds@gmail.com>
To: Jeff King <peff@peff.net>
Cc: Git Mailing List <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>,
	Elijah Newren <newren@gmail.com>
Subject: Re: [PATCH] pack-objects: fix performance issues on packing large deltas
Date: Sat, 21 Jul 2018 06:23:32 +0200	[thread overview]
Message-ID: <CACsJy8AcDqNtV1VfHB+ZD=wvOxRyhobacreaNpjSm3NcLCMKWA@mail.gmail.com> (raw)
In-Reply-To: <20180720174004.GA22486@sigill.intra.peff.net>

On Fri, Jul 20, 2018 at 7:40 PM Jeff King <peff@peff.net> wrote:
>
> On Fri, Jul 20, 2018 at 05:39:43PM +0200, Nguyễn Thái Ngọc Duy wrote:
>
> > Let's start with some background about oe_delta_size() and
> > oe_set_delta_size(). If you already know, skip the next paragraph.
> >
> > These two are added in 0aca34e826 (pack-objects: shrink delta_size
> > field in struct object_entry - 2018-04-14) to help reduce 'struct
> > object_entry' size. The delta size field in this struct is reduced to
> > only contain max 2MB. So if any new delta is produced and larger than
> > 2MB, it's dropped because we can't really save such a large size
> > anywhere. Fallback is provided in case existingpackfiles already have
> > large deltas, then we can retrieve it from the pack.
>
> Minor nit, but isn't this 1MB (it was 2MB after one of your patches, but
> I think v2.18.0 has 20 bits)?

Argh.. I think I thought "2 ** 20" in my mind then typed "2 << 20" in
python. And I thought I made a mistake in my previous commit message
because it does mention 1MB...

> > With this, we do not have to drop deltas in try_delta() anymore. Of
> > course the downside is we use slightly more memory, even compared to
> > 2.17.0. But since this is considered an uncommon case, a bit more
> > memory consumption should not be a problem.
>
> I wondered how common this might be. The easiest way to see the largest
> delta sizes is:
>
>   git cat-file --batch-all-objects \
>                --batch-check='%(objectsize:disk) %(deltabase)' |
>   grep -v 0000000000000000000000000000000000000000 |
>   sort -rn | head
>
> The biggest one in the kernel is ~300k. Which is about what I'd expect
> for a normal source code repo. Even some private repos I have with a lot
> of binary artifacts top out at about 3MB. So the new 32MB is probably

I'll keep these numbers in v2 commit message, easier to find later.

> > [1] With a small tweak. 2.17.0 on 64-bit linux can hold 2^64 byte
> >     deltas, which is absolutely insane. But windows builds, even
> >     64-bit version, only hold 2^32. So reducing it to 2^32 should be
> >     quite safe.
>
> I'm not sure I completely agree with this. While 4GB deltas should be
> pretty rare, the nice thing about 64-bit is that you never have to even
> think about whether the variable is large enough. I think the 2^32
> limitations on Windows are something we should be fixing in the long
> term (though there it is even worse because it is not just deltas, but
> entire objects).

I guess that means we stick to uint64_t then? It does increase more
memory on 32-bit architecture (and probably processing cost as well
because 64-bit uses up 2 registers).

> > @@ -2278,6 +2274,8 @@ static void init_threaded_search(void)
> >       pthread_mutex_init(&cache_mutex, NULL);
> >       pthread_mutex_init(&progress_mutex, NULL);
> >       pthread_cond_init(&progress_cond, NULL);
> > +     pthread_mutex_init(&to_pack.lock, NULL);
> > +     to_pack.lock_initialized = 1;
> >       old_try_to_free_routine = set_try_to_free_routine(try_to_free_from_threads);
> >  }
>
> This is new in this iteration. I guess this is to cover the case where
> we are built with pthread support, but --threads=1?

If you mean the "lock_initialized" variable, not really. the other
_lock() macros in builtin/ call pthread_mutex_lock() unconditionally,
which is fine. But I feel a bit uncomfortable doing the same in
pack-objects.h which technically is library code (but yes practically
is a long arm of builtin/pack-objects.c), so lock_initialized is there
to make sure we don't touch uninitialized locks if somebody forgets to
init them first.

> Given that we no longer have to touch this lock during the realloc, is
> it worth actually putting it into to_pack? Instead, we could keep it
> local to pack-objects, alongside all the other locks (and use the
> lock_mutex() helper which handles the single-thread case).

You probably notice the lock name is not "delta_size_lock". I intended
to reuse this for locking other fields in struct packing_data as well.
But that might be a bad idea.

I have no strong opinion about this, so if we still end up locking the
whole functions, I'll just move the lock back close to the others in
builtin/pack-objects.c

> Your original patch had to copy the oe_* helpers into the file to handle
> that. But I think we're essentially just locking the whole functions:

I'll try to avoid this lock when deltas are small and see if it helps
the linux.git case on Elijah's machine. So we may end up locking just
a part of these functions.
-- 
Duy

  reply	other threads:[~2018-07-21  4:24 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-18 22:51 2.18.0 Regression: packing performance and effectiveness Elijah Newren
2018-07-18 22:51 ` [RFC PATCH] fix-v1: revert "pack-objects: shrink delta_size field in struct object_entry" Elijah Newren
2018-07-18 22:51 ` [RFC PATCH] fix-v2: make OE_DELTA_SIZE_BITS a bit bigger Elijah Newren
2018-07-19  5:41 ` 2.18.0 Regression: packing performance and effectiveness Duy Nguyen
2018-07-19  5:49   ` Jeff King
2018-07-19 15:27   ` Elijah Newren
2018-07-19 15:43     ` Duy Nguyen
2018-07-19  5:44 ` Jeff King
2018-07-19  5:57   ` Duy Nguyen
2018-07-19 15:16     ` Duy Nguyen
2018-07-19 16:42       ` Elijah Newren
2018-07-19 17:23         ` Jeff King
2018-07-19 17:31           ` Duy Nguyen
2018-07-19 18:24             ` Duy Nguyen
2018-07-19 19:17               ` Jeff King
2018-07-19 23:11               ` Elijah Newren
2018-07-20  5:28                 ` Jeff King
2018-07-20  5:30                   ` Jeff King
2018-07-20  5:47                   ` Duy Nguyen
2018-07-20 17:21                   ` Elijah Newren
2018-07-19 17:04       ` Jeff King
2018-07-19 19:25       ` Junio C Hamano
2018-07-19 19:27         ` Junio C Hamano
2018-07-20 15:39 ` [PATCH] pack-objects: fix performance issues on packing large deltas Nguyễn Thái Ngọc Duy
2018-07-20 17:40   ` Jeff King
2018-07-21  4:23     ` Duy Nguyen [this message]
2018-07-23 21:37       ` Jeff King
2018-07-20 17:43   ` Elijah Newren
2018-07-20 23:52     ` Elijah Newren
2018-07-21  4:07       ` Duy Nguyen
2018-07-21  7:08         ` Duy Nguyen
2018-07-21  4:47     ` Duy Nguyen
2018-07-21  6:56       ` Elijah Newren
2018-07-21  7:14         ` Duy Nguyen
2018-07-22  6:22       ` Elijah Newren
2018-07-22  6:49         ` Duy Nguyen
2018-07-23 12:34     ` Elijah Newren
2018-07-23 15:50       ` Duy Nguyen
2018-07-22  8:04   ` [PATCH v2] " Nguyễn Thái Ngọc Duy
2018-07-23 18:04     ` Junio C Hamano
2018-07-23 18:38       ` Duy Nguyen
2018-07-23 18:49         ` Duy Nguyen
2018-07-23 21:30           ` Jeff King
2018-07-26  8:12     ` Johannes Sixt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CACsJy8AcDqNtV1VfHB+ZD=wvOxRyhobacreaNpjSm3NcLCMKWA@mail.gmail.com' \
    --to=pclouds@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).