git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Taylor Blau <me@ttaylorr.com>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
	Derrick Stolee <derrickstolee@github.com>
Subject: Re: [PATCH v2] builtin/pack-objects.c: introduce `pack.extraCruftTips`
Date: Wed, 3 May 2023 15:59:06 -0400	[thread overview]
Message-ID: <20230503195906.GA1796816@coredump.intra.peff.net> (raw)
In-Reply-To: <73ad7b90e1fe6c15f41ff828651f7ab06076ffd8.1683072587.git.me@ttaylorr.com>

On Tue, May 02, 2023 at 08:09:47PM -0400, Taylor Blau wrote:

> However, there is no option to be able to keep around certain objects
> that have otherwise aged out of the grace period. The only way to retain
> those objects is:
> 
>   - to point a reference at them, which may be undesirable or
>     infeasible,
>   - to track them via the reflog, which may be undesirable since the
>     reflog's lifetime is limited to that of the reference it's tracking
>     (and callers may want to keep those unreachable objects around for
>     longer)
>   - to extend the grace period, which may keep around other objects that
>     the caller *does* want to discard,
>   - or, to force the caller to construct the pack of objects they want
>     to keep themselves, and then mark the pack as kept by adding a
>     ".keep" file.

OK, I understand the use case you're trying to support, and your
approach mostly makes sense. But there are two things I was surprised by
in the implementation:

  1. Does this need to be tied to cruft packs? The same logic would
     apply to "repack -A" which turns objects loose (of course you
     probably don't want to do that in the long term for performance
     reasons, but it's conceptually the same thing; see below).

  2. Why is there a separate walk distinct from the one that rescues
     recent-but-unreachable objects?

Conceptually it seems to me that the simplest and most flexible way to
think of this new feature is: pretend these objects are recent enough to
be kept in the grace period, even though their mtimes do not qualify".

And then everything else would just fall out naturally. Am I missing
something?

In a pre-cruft-pack world, I'd have just expected the patch to look like
this:

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index bd6ad016d6..1d655dc758 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -4077,6 +4077,7 @@ static void get_object_list(struct rev_info *revs, int ac, const char **av)
 		if (add_unseen_recent_objects_to_traversal(revs,
 				unpack_unreachable_expiration, NULL, 0))
 			die(_("unable to add recent objects"));
+		add_tips_from_program(revs);
 		if (prepare_revision_walk(revs))
 			die(_("revision walk setup failed"));
 		traverse_commit_list(revs, record_recent_commit,

Sadly the cruft-pack feature doesn't really share this code, but I think
it does something similar, and could just consider those synthetic tips
as "recent" for this run.

> +static int enumerate_extra_cruft_tips_1(struct rev_info *revs, const char *args)
> +{
> +	struct child_process cmd = CHILD_PROCESS_INIT;
> +	struct strbuf buf = STRBUF_INIT;
> +	FILE *out = NULL;
> +	int ret = 0;
> +
> +	cmd.use_shell = 1;
> +	cmd.out = -1;
> +
> +	strvec_push(&cmd.args, args);
> +	strvec_pushv(&cmd.env, (const char **)local_repo_env);

Why does this clear the environment of local-repo variables? That seems
unlike most other hooks we have, and would cause confusion if $GIT_DIR
or various config variables are important (e.g., should "git -c foo.bar
gc" persist foo.bar when running the hook? It usually does).

I know that some hooks that try to change repositories by changing
directories have the opposite confusion ($GIT_DIR is set, but they did
not want it). But it makes more sense to me to remain consistent with
how other hooks behave.

> +	if (start_command(&cmd)) {
> +		ret = -1;
> +		goto done;
> +	}

This may be a matter of taste, but you can "return -1" directly here, as
nothing has been allocated (and a failed start_command() will call
child_process_clear() for you). This would mean "out" is always set in
the "done:" label, so it wouldn't need a NULL-check (nor an
initialization).

> +
> +	out = xfdopen(cmd.out, "r");
> +	while (strbuf_getline_lf(&buf, out) != EOF) {

is there any reason to be a stickler for LF versus CRLF here? I.e.,
wouldn't strbuf_getline() be more friendly, with little chance that we
misinterpret the result?

> +		struct object_id oid;
> +		struct object *obj;
> +		const char *rest;
> +
> +		if (parse_oid_hex(buf.buf, &oid, &rest) || *rest) {
> +			ret = error(_("invalid extra cruft tip: '%s'"), buf.buf);
> +			goto done;
> +		}
> +
> +		obj = parse_object(the_repository, &oid);
> +		if (!obj)
> +			continue;

This parse_object() can be pretty expensive, especially if you are
rescuing a lot of objects (or if your program directly references large
blobs). Can we use oid_object_info() here in combination with
lookup_object_by_type()?

-Peff

  parent reply	other threads:[~2023-05-03 19:59 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-20 17:27 [PATCH] builtin/pack-objects.c: introduce `pack.extraCruftTips` Taylor Blau
2023-04-20 18:12 ` Junio C Hamano
2023-04-20 19:30   ` Taylor Blau
2023-04-20 19:52     ` Junio C Hamano
2023-04-20 20:48       ` Taylor Blau
2023-04-21  0:10 ` Chris Torek
2023-04-21  2:14   ` Taylor Blau
2023-04-25 19:42 ` Derrick Stolee
2023-04-25 21:25   ` Taylor Blau
2023-04-26 10:52     ` Derrick Stolee
2023-05-03  0:06       ` Taylor Blau
2023-05-03  0:09 ` [PATCH v2] " Taylor Blau
2023-05-03 14:01   ` Derrick Stolee
2023-05-03 19:59   ` Jeff King [this message]
2023-05-03 21:22     ` Taylor Blau
2023-05-05 21:23       ` Jeff King
2023-05-06  0:06         ` Taylor Blau
2023-05-06  0:14           ` Taylor Blau
2023-05-03 21:28     ` Taylor Blau
2023-05-05 21:26       ` Jeff King
2023-05-05 22:13         ` Jeff King
2023-05-06  0:13           ` Taylor Blau
2023-05-06  0:20             ` Taylor Blau
2023-05-06  2:12             ` Jeff King
2023-05-03 22:05 ` [PATCH v3] " Taylor Blau
2023-05-03 23:18   ` Junio C Hamano
2023-05-03 23:42     ` Junio C Hamano
2023-05-03 23:48       ` Taylor Blau
2023-05-03 23:50       ` Taylor Blau
2023-05-05 21:39     ` Jeff King
2023-05-05 22:19   ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230503195906.GA1796816@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).