git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Patrick Steinhardt <ps@pks.im>
Cc: git@vger.kernel.org, Jeff King <peff@peff.net>,
	Junio C Hamano <gitster@pobox.com>,
	Eric Sunshine <sunshine@sunshineco.com>
Subject: Re: [PATCH v2 3/3] builtin/repack.c: implement support for `--max-cruft-size`
Date: Thu, 5 Oct 2023 13:35:15 -0400	[thread overview]
Message-ID: <ZR7z0/tStMSjBkFR@nand.local> (raw)
In-Reply-To: <ZR6nKzflu_18JnoG@tanuki>

On Thu, Oct 05, 2023 at 02:08:11PM +0200, Patrick Steinhardt wrote:
> On Mon, Oct 02, 2023 at 08:44:32PM -0400, Taylor Blau wrote:
> [snip]
> > diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
> > index 90806fd26a..fa0541b416 100644
> > --- a/Documentation/git-gc.txt
> > +++ b/Documentation/git-gc.txt
> > @@ -59,6 +59,13 @@ be performed as well.
> >  	cruft pack instead of storing them as loose objects. `--cruft`
> >  	is on by default.
> >
> > +--max-cruft-size=<n>::
> > +	When packing unreachable objects into a cruft pack, limit the
> > +	size of new cruft packs to be at most `<n>`. Overrides any
>
> We should probably mention the unit here, which is bytes.

Perhaps, though I'm OK with omitting it in the name of brevity, but only
since we link off to the relevant section in the git-repack(1)
documentation (which does include the units there).

> > +static void collapse_small_cruft_packs(FILE *in, size_t max_size,
> > +				       struct existing_packs *existing)
> > +{
> > +	struct packed_git **existing_cruft, *p;
> > +	struct strbuf buf = STRBUF_INIT;
> > +	size_t total_size = 0;
> > +	size_t existing_cruft_nr = 0;
> > +	size_t i;
> > +
> > +	ALLOC_ARRAY(existing_cruft, existing->cruft_packs.nr);
> > +
> > +	for (p = get_all_packs(the_repository); p; p = p->next) {
> > +		if (!(p->is_cruft && p->pack_local))
> > +			continue;
> > +
> > +		strbuf_reset(&buf);
> > +		strbuf_addstr(&buf, pack_basename(p));
> > +		strbuf_strip_suffix(&buf, ".pack");
> > +
> > +		if (!string_list_has_string(&existing->cruft_packs, buf.buf))
> > +			continue;
> > +
> > +		if (existing_cruft_nr >= existing->cruft_packs.nr)
> > +			BUG("too many cruft packs (found %"PRIuMAX", but knew "
> > +			    "of %"PRIuMAX")",
> > +			    (uintmax_t)existing_cruft_nr + 1,
> > +			    (uintmax_t)existing->cruft_packs.nr);
> > +		existing_cruft[existing_cruft_nr++] = p;
> > +	}
> > +
> > +	QSORT(existing_cruft, existing_cruft_nr, existing_cruft_pack_cmp);
> > +
> > +	for (i = 0; i < existing_cruft_nr; i++) {
> > +		size_t proposed;
> > +
> > +		p = existing_cruft[i];
> > +		proposed = st_add(total_size, p->pack_size);
> > +
> > +		if (proposed <= max_size) {
> > +			total_size = proposed;
> > +			fprintf(in, "-%s\n", pack_basename(p));
> > +		} else {
> > +			retain_cruft_pack(existing, p);
> > +			fprintf(in, "%s\n", pack_basename(p));
> > +		}
> > +	}
> > +
> > +	for (i = 0; i < existing->non_kept_packs.nr; i++)
> > +		fprintf(in, "-%s.pack\n",
> > +			existing->non_kept_packs.items[i].string);
>
> As far as I can see, the non-kept packs are passed to
> git-pack-objects(1) both in the cases where we do collapse small cruft
> packs and where we don't. Is there any particular reason why we handle
> those in both code paths separately instead of merging that logic? Is
> the ordering of packfiles important here?

No particularly good reason. The ordering isn't important, and you could do something like this:

--- 8< ---
diff --git a/builtin/repack.c b/builtin/repack.c
index 04770b15fe..6e17fc3f51 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -905,10 +905,6 @@ static void collapse_small_cruft_packs(FILE *in, size_t max_size,
 		}
 	}

-	for (i = 0; i < existing->non_kept_packs.nr; i++)
-		fprintf(in, "-%s.pack\n",
-			existing->non_kept_packs.items[i].string);
-
 	strbuf_release(&buf);
 }

@@ -959,14 +955,13 @@ static int write_cruft_pack(const struct pack_objects_args *args,
 	in = xfdopen(cmd.in, "w");
 	for_each_string_list_item(item, names)
 		fprintf(in, "%s-%s.pack\n", pack_prefix, item->string);
-	if (args->max_pack_size && !cruft_expiration) {
+	if (args->max_pack_size && !cruft_expiration)
 		collapse_small_cruft_packs(in, args->max_pack_size, existing);
-	} else {
-		for_each_string_list_item(item, &existing->non_kept_packs)
-			fprintf(in, "-%s.pack\n", item->string);
+	else
 		for_each_string_list_item(item, &existing->cruft_packs)
 			fprintf(in, "-%s.pack\n", item->string);
-	}
+	for_each_string_list_item(item, &existing->non_kept_packs)
+		fprintf(in, "-%s.pack\n", item->string);
 	for_each_string_list_item(item, &existing->kept_packs)
 		fprintf(in, "%s.pack\n", item->string);
 	fclose(in);
--- >8 ---

But I think that having a small amount of duplication is a fair price to
pay for being able to see the whole input given to pack-objects outlined
in a single function.

Thanks,
Taylor

  reply	other threads:[~2023-10-05 17:43 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-07 21:51 [PATCH 0/2] repack: implement `--cruft-max-size` Taylor Blau
2023-09-07 21:52 ` [PATCH 1/2] t7700: split cruft-related tests to t7704 Taylor Blau
2023-09-08  0:01   ` Eric Sunshine
2023-09-07 21:52 ` [PATCH 2/2] builtin/repack.c: implement support for `--cruft-max-size` Taylor Blau
2023-09-07 23:42   ` Junio C Hamano
2023-09-25 18:01     ` Taylor Blau
2023-09-08 11:21   ` Patrick Steinhardt
2023-10-02 20:30     ` Taylor Blau
2023-10-03  0:44 ` [PATCH v2 0/3] repack: implement `--cruft-max-size` Taylor Blau
2023-10-03  0:44   ` [PATCH v2 1/3] t7700: split cruft-related tests to t7704 Taylor Blau
2023-10-03  0:44   ` [PATCH v2 2/3] builtin/repack.c: parse `--max-pack-size` with OPT_MAGNITUDE Taylor Blau
2023-10-05 11:31     ` Patrick Steinhardt
2023-10-05 17:28       ` Taylor Blau
2023-10-05 20:22         ` Junio C Hamano
2023-10-03  0:44   ` [PATCH v2 3/3] builtin/repack.c: implement support for `--max-cruft-size` Taylor Blau
2023-10-05 12:08     ` Patrick Steinhardt
2023-10-05 17:35       ` Taylor Blau [this message]
2023-10-05 20:25       ` Junio C Hamano
2023-10-07 17:20     ` [PATCH] repack: free existing_cruft array after use Jeff King
2023-10-09  1:24       ` Taylor Blau
2023-10-09 17:28         ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZR7z0/tStMSjBkFR@nand.local \
    --to=me@ttaylorr.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=ps@pks.im \
    --cc=sunshine@sunshineco.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).