git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Taylor Blau <me@ttaylorr.com>
Cc: git@vger.kernel.org, dstolee@microsoft.com, gitster@pobox.com
Subject: Re: [PATCH v2 8/8] builtin/repack.c: add '--geometric' option
Date: Wed, 17 Feb 2021 13:17:16 -0500	[thread overview]
Message-ID: <YC1drGrIEg0C7Zo5@coredump.intra.peff.net> (raw)
In-Reply-To: <d5561585c2221a9635eb0fc7a65298ee8a2b6348.1612411124.git.me@ttaylorr.com>

On Wed, Feb 03, 2021 at 10:59:25PM -0500, Taylor Blau wrote:

> Often it is useful to both:
> 
>   - have relatively few packfiles in a repository, and
> 
>   - avoid having so few packfiles in a repository that we repack its
>     entire contents regularly
> 
> This patch implements a '--geometric=<n>' option in 'git repack'. This
> allows the caller to specify that they would like each pack to be at
> least a factor times as large as the previous largest pack (by object
> count).
> 
> Concretely, say that a repository has 'n' packfiles, labeled P1, P2,
> ..., up to Pn. Each packfile has an object count equal to 'objects(Pn)'.
> With a geometric factor of 'r', it should be that:
> 
>   objects(Pi) > r*objects(P(i-1))
> 
> for all i in [1, n], where the packs are sorted by
> 
>   objects(P1) <= objects(P2) <= ... <= objects(Pn).

Just devil's advocating for a moment.

I think in this kind of geometric roll-up strategy, you want to imagine
that you are rolling up recent pushes but leaving untouched a good
"base" pack that you previously created.

And that will usually be true if you are doing the rollup based on
number of objects (or size, etc). But it won't always be (e.g., for some
reason somebody makes a very large push relative to the current
repository size). What happens when this assumption is violated?

In some ways, it is a good thing to drift away from this "base pack"
view of the world. If we're trying to amortize the per-object work done,
then we are better off rolling up the small things into the large,
regardless of where they came from.

But the base pack may also have other properties we want to retain. Two
I can think of:

  - it may have a .bitmap that we'll be throwing away, without
    generating a new one. I know that your end-game involves writing a
    midx with bitmaps that covers all of the packs, so this would become
    a non-issue in that strategy.

  - it may have been more carefully packed (e.g., with a larger window
    size, using "-f", etc) than the packs we got from pushes. We do
    _mostly_ retain the deltas when we roll up the packs, so it probably
    only has a small impact in practice (I'd expect in a few cases we'd
    throw away deltas because a pushed pack contains a duplicate of its
    base object that we added via --fix-thin).

So I suspect it's probably OK in practice. These cases would happen
rarely, and the impact would not be all that big. The bitmap thing I'd
worry the most about. As part of a larger strategy involving a midx it
is taken care of, but people using just this new feature may not realize
that. The bitmaps of course are "just" an optimization, but it's hard to
say how dire things are when they don't exist. For many situations,
probably not very dire. But I know that on our servers, when repos lack
bitmaps, people notice the performance degradation.

On the other hand, by definition this happens in a case where there are
more objects that have just been pushed (and are therefore not
bitmapped) than existed already. So you _already_ have a performance
problem either way until you get bitmap coverage of those new objects.

> --- a/Documentation/git-repack.txt
> +++ b/Documentation/git-repack.txt
> @@ -165,6 +165,17 @@ depth is 4095.
>  	Pass the `--delta-islands` option to `git-pack-objects`, see
>  	linkgit:git-pack-objects[1].
>  
> +-g=<factor>::
> +--geometric=<factor>::
> +	Arrange resulting pack structure so that each successive pack
> +	contains at least `<factor>` times the number of objects as the
> +	next-largest pack.
> ++
> +`git repack` ensures this by determining a "cut" of packfiles that need to be
> +repacked into one in order to ensure a geometric progression. It picks the
> +smallest set of packfiles such that as many of the larger packfiles (by count of
> +objects contained in that pack) may be left intact.

I think we might need to make clear in the documentation how this
differs from other repacks, in that it is not considering reachability
at all. I like the term "roll up" to describe what is happening, but we
probably need to define that term clearly, as well.

Especially important, I think, is that we talk about what's happening
with loose objects, which are part of the rollup here. And IMHO we
should make clear that for now we include them all, without
consideration of their reachability, but that this may change in the
future.

Likewise, are there any options that are incompatible with "-g"? I have
to imagine that "--write-bitmap-index" would not work very well. I don't
know that we need to enumerate them all, but I'm wondering if a blanket
"this may not play well with other options" warning may be advisable.

> +static void split_pack_geometry(struct pack_geometry *geometry, int factor)
> [...]

I'll admit I didn't carefully think about the math of the progression
here. IMHO the exact split is the least interesting part of this whole
series (compared to the general idea of "rolling up some packs" versus a
whole repack). Between the comments and the tests, I'll assume it's
generally behaving as advertised. (I of course did look for any obvious
coding errors, but didn't see any).

-Peff

  reply	other threads:[~2021-02-17 18:18 UTC|newest]

Thread overview: 120+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-19 23:23 [PATCH 00/10] repack: support repacking into a geometric sequence Taylor Blau
2021-01-19 23:24 ` [PATCH 01/10] packfile: introduce 'find_kept_pack_entry()' Taylor Blau
2021-01-20 13:40   ` Derrick Stolee
2021-01-20 14:38     ` Taylor Blau
2021-01-29  2:33   ` Junio C Hamano
2021-01-29 18:38     ` Taylor Blau
2021-01-29 19:31     ` Jeff King
2021-01-29 20:20       ` Junio C Hamano
2021-01-19 23:24 ` [PATCH 02/10] revision: learn '--no-kept-objects' Taylor Blau
2021-01-29  3:10   ` Junio C Hamano
2021-01-29 19:13     ` Taylor Blau
2021-01-19 23:24 ` [PATCH 03/10] builtin/pack-objects.c: learn '--assume-kept-packs-closed' Taylor Blau
2021-01-29  3:21   ` Junio C Hamano
2021-01-29 19:19     ` Jeff King
2021-01-29 20:01       ` Taylor Blau
2021-01-29 20:25         ` Jeff King
2021-01-29 22:10           ` Taylor Blau
2021-01-29 22:57             ` Jeff King
2021-01-29 23:03             ` Junio C Hamano
2021-01-29 23:28               ` Taylor Blau
2021-02-02  3:04                 ` Taylor Blau
2021-01-29 23:31               ` Jeff King
2021-01-29 22:13           ` Junio C Hamano
2021-01-29 20:30       ` Junio C Hamano
2021-01-29 22:43         ` Jeff King
2021-01-29 22:53           ` Taylor Blau
2021-01-29 23:00             ` Jeff King
2021-01-29 23:10             ` Junio C Hamano
2021-01-19 23:24 ` [PATCH 04/10] p5303: add missing &&-chains Taylor Blau
2021-01-19 23:24 ` [PATCH 05/10] p5303: measure time to repack with keep Taylor Blau
2021-01-29  3:40   ` Junio C Hamano
2021-01-29 19:32     ` Jeff King
2021-01-29 20:04       ` [PATCH] p5303: avoid sed GNU-ism Jeff King
2021-01-29 20:19         ` Eric Sunshine
2021-01-29 20:27           ` Jeff King
2021-01-29 20:36             ` Eric Sunshine
2021-01-29 22:11               ` Taylor Blau
2021-01-29 20:38       ` [PATCH 05/10] p5303: measure time to repack with keep Junio C Hamano
2021-01-29 22:10         ` Jeff King
2021-01-29 23:12           ` Junio C Hamano
2021-01-19 23:24 ` [PATCH 06/10] pack-objects: rewrite honor-pack-keep logic Taylor Blau
2021-01-19 23:24 ` [PATCH 07/10] packfile: add kept-pack cache for find_kept_pack_entry() Taylor Blau
2021-01-19 23:24 ` [PATCH 08/10] builtin/pack-objects.c: teach '--keep-pack-stdin' Taylor Blau
2021-01-19 23:24 ` [PATCH 09/10] builtin/repack.c: extract loose object handling Taylor Blau
2021-01-20 13:59   ` Derrick Stolee
2021-01-20 14:34     ` Taylor Blau
2021-01-20 15:51       ` Derrick Stolee
2021-01-21  3:45     ` Junio C Hamano
2021-01-19 23:24 ` [PATCH 10/10] builtin/repack.c: add '--geometric' option Taylor Blau
2021-01-20 14:05 ` [PATCH 00/10] repack: support repacking into a geometric sequence Derrick Stolee
2021-02-04  3:58 ` [PATCH v2 0/8] " Taylor Blau
2021-02-04  3:58   ` [PATCH v2 1/8] packfile: introduce 'find_kept_pack_entry()' Taylor Blau
2021-02-16 21:42     ` Jeff King
2021-02-16 21:48       ` Taylor Blau
2021-02-04  3:58   ` [PATCH v2 2/8] revision: learn '--no-kept-objects' Taylor Blau
2021-02-16 23:17     ` Jeff King
2021-02-17 18:35       ` Taylor Blau
2021-02-04  3:59   ` [PATCH v2 3/8] builtin/pack-objects.c: add '--stdin-packs' option Taylor Blau
2021-02-16 23:46     ` Jeff King
2021-02-17 18:59       ` Taylor Blau
2021-02-17 19:21         ` Jeff King
2021-02-04  3:59   ` [PATCH v2 4/8] p5303: add missing &&-chains Taylor Blau
2021-02-04  3:59   ` [PATCH v2 5/8] p5303: measure time to repack with keep Taylor Blau
2021-02-16 23:58     ` Jeff King
2021-02-17  0:02       ` Jeff King
2021-02-17 19:13       ` Taylor Blau
2021-02-17 19:25         ` Jeff King
2021-02-04  3:59   ` [PATCH v2 6/8] builtin/pack-objects.c: rewrite honor-pack-keep logic Taylor Blau
2021-02-17 16:05     ` Jeff King
2021-02-17 19:23       ` Taylor Blau
2021-02-17 19:29         ` Jeff King
2021-02-04  3:59   ` [PATCH v2 7/8] packfile: add kept-pack cache for find_kept_pack_entry() Taylor Blau
2021-02-17 17:11     ` Jeff King
2021-02-17 19:54       ` Taylor Blau
2021-02-17 20:25         ` Jeff King
2021-02-17 20:29           ` Taylor Blau
2021-02-17 21:43             ` Jeff King
2021-02-04  3:59   ` [PATCH v2 8/8] builtin/repack.c: add '--geometric' option Taylor Blau
2021-02-17 18:17     ` Jeff King [this message]
2021-02-17 20:01       ` Taylor Blau
2021-02-17  0:01   ` [PATCH v2 0/8] repack: support repacking into a geometric sequence Jeff King
2021-02-17 18:18     ` Jeff King
2021-02-18  3:14 ` [PATCH v3 " Taylor Blau
2021-02-18  3:14   ` [PATCH v3 1/8] packfile: introduce 'find_kept_pack_entry()' Taylor Blau
2021-02-18  3:14   ` [PATCH v3 2/8] revision: learn '--no-kept-objects' Taylor Blau
2021-02-18  3:14   ` [PATCH v3 3/8] builtin/pack-objects.c: add '--stdin-packs' option Taylor Blau
2021-02-18  3:14   ` [PATCH v3 4/8] p5303: add missing &&-chains Taylor Blau
2021-02-18  3:14   ` [PATCH v3 5/8] p5303: measure time to repack with keep Taylor Blau
2021-02-18  3:14   ` [PATCH v3 6/8] builtin/pack-objects.c: rewrite honor-pack-keep logic Taylor Blau
2021-02-18  3:14   ` [PATCH v3 7/8] packfile: add kept-pack cache for find_kept_pack_entry() Taylor Blau
2021-02-18  3:14   ` [PATCH v3 8/8] builtin/repack.c: add '--geometric' option Taylor Blau
2021-02-23  0:31   ` [PATCH v3 0/8] repack: support repacking into a geometric sequence Jeff King
2021-02-23  1:06     ` Taylor Blau
2021-02-23  1:42       ` Jeff King
2021-02-23  2:24 ` [PATCH v4 " Taylor Blau
2021-02-23  2:25   ` [PATCH v4 1/8] packfile: introduce 'find_kept_pack_entry()' Taylor Blau
2021-02-23  2:25   ` [PATCH v4 2/8] revision: learn '--no-kept-objects' Taylor Blau
2021-02-23  2:25   ` [PATCH v4 3/8] builtin/pack-objects.c: add '--stdin-packs' option Taylor Blau
2021-02-23  8:07     ` Junio C Hamano
2021-02-23 18:51       ` Jeff King
2021-02-23  2:25   ` [PATCH v4 4/8] p5303: add missing &&-chains Taylor Blau
2021-02-23  2:25   ` [PATCH v4 5/8] p5303: measure time to repack with keep Taylor Blau
2021-02-23  2:25   ` [PATCH v4 6/8] builtin/pack-objects.c: rewrite honor-pack-keep logic Taylor Blau
2021-02-23  2:25   ` [PATCH v4 7/8] packfile: add kept-pack cache for find_kept_pack_entry() Taylor Blau
2021-02-23  2:25   ` [PATCH v4 8/8] builtin/repack.c: add '--geometric' option Taylor Blau
2021-02-24 23:19     ` Junio C Hamano
2021-02-24 23:43       ` Junio C Hamano
2021-03-04 21:40         ` Taylor Blau
2021-03-04 21:55       ` Taylor Blau
2021-02-23  3:39   ` [PATCH v4 0/8] repack: support repacking into a geometric sequence Jeff King
2021-02-23  7:43   ` Junio C Hamano
2021-02-23 18:44     ` Jeff King
2021-02-23 19:54       ` Martin Fick
2021-02-23 20:06         ` Taylor Blau
2021-02-23 21:57           ` Martin Fick
2021-02-23 20:15         ` Jeff King
2021-02-23 21:41           ` Martin Fick
2021-02-23 21:53             ` Jeff King
2021-02-24 18:13               ` Martin Fick
2021-02-26  6:23                 ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YC1drGrIEg0C7Zo5@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).