git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: Taylor Blau <me@ttaylorr.com>
Cc: git@vger.kernel.org, "Jeff King" <peff@peff.net>,
	"Junio C Hamano" <gitster@pobox.com>,
	"René Scharfe" <l.s.r@web.de>
Subject: Re: [PATCH v2 2/4] midx-write.c: factor out common want_included_pack() routine
Date: Tue, 2 Apr 2024 13:47:38 +0200	[thread overview]
Message-ID: <ZgvwWlvrw8MpC_AZ@tanuki> (raw)
In-Reply-To: <0064e363c0cc3288346585a6b4340444ce7b863c.1712006190.git.me@ttaylorr.com>

[-- Attachment #1: Type: text/plain, Size: 3818 bytes --]

On Mon, Apr 01, 2024 at 05:16:38PM -0400, Taylor Blau wrote:
> When performing a 'git multi-pack-index repack', the MIDX machinery
> tries to aggregate MIDX'd packs together either to (a) fill the given
> `--batch-size` argument, or (b) combine all packs together.
> 
> In either case (using the `midx-write.c::fill_included_packs_batch()` or
> `midx-write.c::fill_included_packs_all()` function, respectively), we
> evaluate whether or not we want to repack each MIDX'd pack, according to
> whether or it is loadable, kept, cruft, or non-empty.
> 
> Between the two `fill_included_packs_` callers, they both care about the
> same conditions, except for `fill_included_packs_batch()` which also
> cares that the pack is non-empty.
> 
> We could extract two functions (say, `want_included_pack()` and a
> `_nonempty()` variant), but this is not necessary. For the case in
> `fill_included_packs_all()` which does not check the pack size, we add
> all of the pack's objects assuming that the pack meets all other
> criteria. But if the pack is empty in the first place, we add all of its
> zero objects, so whether or not we "accept" or "reject" it in the first
> place is irrelevant.
> 
> This change improves the readability in both `fill_included_packs_`
> functions.
> 
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  midx-write.c | 32 ++++++++++++++++++++------------
>  1 file changed, 20 insertions(+), 12 deletions(-)
> 
> diff --git a/midx-write.c b/midx-write.c
> index 5242d2a724..906efa2169 100644
> --- a/midx-write.c
> +++ b/midx-write.c
> @@ -1349,6 +1349,24 @@ static int compare_by_mtime(const void *a_, const void *b_)
>  	return 0;
>  }
>  
> +static int want_included_pack(struct repository *r,
> +			      struct multi_pack_index *m,
> +			      int pack_kept_objects,
> +			      uint32_t pack_int_id)
> +{
> +	struct packed_git *p;
> +	if (prepare_midx_pack(r, m, pack_int_id))
> +		return 0;
> +	p = m->packs[pack_int_id];
> +	if (!pack_kept_objects && p->pack_keep)
> +		return 0;
> +	if (p->is_cruft)
> +		return 0;
> +	if (open_pack_index(p) || !p->num_objects)
> +		return 0;
> +	return 1;
> +}
> +
>  static int fill_included_packs_all(struct repository *r,
>  				   struct multi_pack_index *m,
>  				   unsigned char *include_pack)
> @@ -1359,11 +1377,7 @@ static int fill_included_packs_all(struct repository *r,
>  	repo_config_get_bool(r, "repack.packkeptobjects", &pack_kept_objects);
>  
>  	for (i = 0; i < m->num_packs; i++) {
> -		if (prepare_midx_pack(r, m, i))
> -			continue;
> -		if (!pack_kept_objects && m->packs[i]->pack_keep)
> -			continue;
> -		if (m->packs[i]->is_cruft)
> +		if (!want_included_pack(r, m, pack_kept_objects, i))
>  			continue;
>  
>  		include_pack[i] = 1;
> @@ -1410,13 +1424,7 @@ static int fill_included_packs_batch(struct repository *r,
>  		struct packed_git *p = m->packs[pack_int_id];
>  		size_t expected_size;

I was briefly wondering whether `m->packs[pack_int_id]` could change
after the above assignment. But there's a loop a bit further up here
that already calls `prepare_midx_pack()`, so this shouldn't happen.

> -		if (!p)
> -			continue;
> -		if (!pack_kept_objects && p->pack_keep)
> -			continue;
> -		if (p->is_cruft)
> -			continue;
> -		if (open_pack_index(p) || !p->num_objects)
> +		if (!want_included_pack(r, m, pack_kept_objects, pack_int_id))
>  			continue;

Another thing I wondered was whether it hurts performance that we now
call `prepare_midx_pack()` twice. But this shouldn't matter either as we
exit early from that function in case `m->packs[pack_int_id]` is already
populated.

So this patch looks good to me.

Patrick

>  		expected_size = st_mult(p->pack_size,
> -- 
> 2.44.0.330.g158d2a670b4
> 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2024-04-02 11:47 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-25 17:24 [PATCH 00/11] midx: split MIDX writing routines into midx-write.c, cleanup Taylor Blau
2024-03-25 17:24 ` [PATCH 01/11] midx-write: initial commit Taylor Blau
2024-03-25 20:30   ` Junio C Hamano
2024-03-25 22:09     ` Taylor Blau
2024-03-25 17:24 ` [PATCH 02/11] midx: extern a pair of shared functions Taylor Blau
2024-03-25 17:24 ` [PATCH 03/11] midx: move `midx_repack` (and related functions) to midx-write.c Taylor Blau
2024-03-25 17:24 ` [PATCH 04/11] midx: move `expire_midx_packs` " Taylor Blau
2024-03-25 17:24 ` [PATCH 05/11] midx: move `write_midx_file_only` " Taylor Blau
2024-03-25 17:24 ` [PATCH 06/11] midx: move `write_midx_file` " Taylor Blau
2024-03-25 17:24 ` [PATCH 07/11] midx: move `write_midx_internal` (and related functions) " Taylor Blau
2024-03-25 17:24 ` [PATCH 08/11] midx-write.c: avoid directly managed temporary strbuf Taylor Blau
2024-03-25 20:33   ` Junio C Hamano
2024-03-25 22:11     ` Taylor Blau
2024-03-25 17:24 ` [PATCH 09/11] midx-write.c: factor out common want_included_pack() routine Taylor Blau
2024-03-25 20:36   ` Junio C Hamano
2024-03-27  8:29   ` Jeff King
2024-03-25 17:24 ` [PATCH 10/11] midx-write.c: check count of packs to repack after grouping Taylor Blau
2024-03-25 20:41   ` Junio C Hamano
2024-03-25 22:11     ` Taylor Blau
2024-03-25 17:24 ` [PATCH 11/11] midx-write.c: use `--stdin-packs` when repacking Taylor Blau
2024-03-27  8:37   ` Jeff King
2024-03-27  8:39 ` [PATCH 00/11] midx: split MIDX writing routines into midx-write.c, cleanup Jeff King
2024-04-01 21:16 ` [PATCH v2 0/4] " Taylor Blau
2024-04-01 21:16   ` [PATCH v2 1/4] midx-write: move writing-related functions from midx.c Taylor Blau
2024-04-01 21:16   ` [PATCH v2 2/4] midx-write.c: factor out common want_included_pack() routine Taylor Blau
2024-04-02 11:47     ` Patrick Steinhardt [this message]
2024-04-01 21:16   ` [PATCH v2 3/4] midx-write.c: check count of packs to repack after grouping Taylor Blau
2024-04-01 21:16   ` [PATCH v2 4/4] midx-write.c: use `--stdin-packs` when repacking Taylor Blau
2024-04-01 21:45     ` Junio C Hamano
2024-04-02 11:47     ` Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZgvwWlvrw8MpC_AZ@tanuki \
    --to=ps@pks.im \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=l.s.r@web.de \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).