git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, dstolee@microsoft.com, gitster@pobox.com
Subject: Re: [PATCH v2 3/8] builtin/pack-objects.c: add '--stdin-packs' option
Date: Wed, 17 Feb 2021 13:59:08 -0500	[thread overview]
Message-ID: <YC1nfH356wfmAKE2@nand.local> (raw)
In-Reply-To: <YCxZcz5JGtxObOF3@coredump.intra.peff.net>

On Tue, Feb 16, 2021 at 06:46:59PM -0500, Jeff King wrote:
> Do we use this partial traversal to impact the write order at all? That
> would be a nice-to-have, but I suspect that just concatenating the packs
> (presumably by descending mtime) ends up with a similar result.

We don't; the objects are written in pack order. In the version of the
patch you reviewed, the order of packs was determined by their hash (due
to the string_list_sort()), but the version I just prepared re-sorts by
mtime.

It's kind of gross, since we need to use QSORT directly on the
string_list internals in order to have access to the ->util field of the
string_list_items (string_list_sort() only lets you compare strings
directly for obvious reasons).

I added a comment describing this hack.

> > +--stdin-packs::
> > +	Read the basenames of packfiles from the standard input, instead
> > +	of object names or revision arguments. The resulting pack
> > +	contains all objects listed in the included packs (those not
> > +	beginning with `^`), excluding any objects listed in the
> > +	excluded packs (beginning with `^`).
> > ++
> > +Incompatible with `--revs`, or options that imply `--revs` (such as
> > +`--all`), with the exception of `--unpacked`, which is compatible.
>
> I know you say "basename" here, but I wonder if it is worth giving an
> example (`pack-1234abcd.pack`) to make it clear in what form we expect
> it. Or possibly something in the `EXAMPLES` section.

Good idea, thanks.

> > --- a/builtin/pack-objects.c
> > +++ b/builtin/pack-objects.c
> > @@ -2979,6 +2979,164 @@ static int git_pack_config(const char *k, const char *v, void *cb)
> >  	return git_default_config(k, v, cb);
> >  }
> >
> > +static int stdin_packs_found_nr;
> > +static int stdin_packs_hints_nr;
>
> I scratched my head at these until I looked further in the code. They're
> the counters for the trace output. Might be worth a brief comment above
> them. (I do approve of adding this kind of trace debugging info; I'm
> pretty accustomed to using gdb or adding one-off debug statements, but
> we really could do a better job in general of making these kinds of
> internals visible to mere mortal admins).

Good call.

> > +static int add_object_entry_from_pack(const struct object_id *oid,
> > +				      struct packed_git *p,
> > +				      uint32_t pos,
> > +				      void *_data)
> > +{
> > +	struct rev_info *revs = _data;
> > +	struct object_info oi = OBJECT_INFO_INIT;
> > +	off_t ofs;
> > +	enum object_type type;
> > +
> > +	display_progress(progress_state, ++nr_seen);
> > +
> > +	ofs = nth_packed_object_offset(p, pos);
> > +
> > +	oi.typep = &type;
> > +	if (packed_object_info(the_repository, p, ofs, &oi) < 0)
> > +		die(_("could not get type of object %s in pack %s"),
> > +		    oid_to_hex(oid), p->pack_name);
>
> Calling out for other reviewers: the oi.typep field will be filled in
> the with _real_ type of the object, even if it's a delta. This is as
> opposed to the return value of packed_object_info(), which may be
> OFS_DELTA or REF_DELTA.
>
> And that real type is what we want here:
>
> > +	else if (type == OBJ_COMMIT) {
> > +		/*
> > +		 * commits in included packs are used as starting points for the
> > +		 * subsequent revision walk
> > +		 */
> > +		add_pending_oid(revs, NULL, oid, 0);
> > +	}
>
> And later when we call create_object_entry().

:-). Yes indeed. As I'm sure that you will recall, the pack-objects
code _does not_ behave well when you give it the packed type of an
object (which is not entirely unexpected, since the pack-objects code
only operates on the true type, so passing the packed type--as I did
when originally writing this patch--is a bug).

> I wondered whether it would be worth adding other objects we might find,
> like trees, in order to increase our traversal. But that doesn't make
> any sense. The whole point is to find the paths, which come from
> traversing from the root trees. And we can only find the root trees by
> starting at commits. Adding any random tree we found would defeat the
> purpose (most of them are sub-trees and would give us a useless partial
> path).

Right.

> Should we avoid adding the commit as a tip for walking if it won't end
> up in the resulting pack? I.e., should we check these:
>
> > +	if (have_duplicate_entry(oid, 0))
> > +		return 0;
> > +
> > +	if (!want_object_in_pack(oid, 0, &p, &ofs))
> > +		return 0;
>
> ...first? I guess it probably doesn't matter too much since we'd
> truncate the traversal as soon as we saw it was in a kept pack anyway.

I agree it doesn't make a difference, but I think placing the extra
guards first makes it easier to read (since the reader doesn't have to
consider how the subsequent traversal would treat it).

> > +static void show_commit_pack_hint(struct commit *commit, void *_data)
> > +{
> > +}
>
> Nothing to do here, since commits don't have a name field. Makes sense.

Yeah. I added a comment to say the same thing, just for extra clarity.

>
> > +static void show_object_pack_hint(struct object *object, const char *name,
> > +				  void *_data)
> > +{
> > +	struct object_entry *oe = packlist_find(&to_pack, &object->oid);
> > +	if (!oe)
> > +		return;
> > +
> > +	/*
> > +	 * Our 'to_pack' list was constructed by iterating all objects packed in
> > +	 * included packs, and so doesn't have a non-zero hash field that you
> > +	 * would typically pick up during a reachability traversal.
> > +	 *
> > +	 * Make a best-effort attempt to fill in the ->hash and ->no_try_delta
> > +	 * here using a now in order to perhaps improve the delta selection
> > +	 * process.
> > +	 */
> > +	oe->hash = pack_name_hash(name);
> > +	oe->no_try_delta = name && no_try_delta(name);
> > +
> > +	stdin_packs_hints_nr++;
> > +}
>
> But for actual objects, we do fill in the hash. I wonder if it's
> possible for oe->hash to have been already filled. I don't think it
> really matters, though. Any value we get is equally valid, so
> overwriting is OK in that case.

Right.

> > +	trace2_data_intmax("pack-objects", the_repository, "stdin_packs_found",
> > +			   stdin_packs_found_nr);
>
> I wonder if it makes sense to report the actual set of packs via trace
> (obviously not as an int, but as a list). That's less helpful for
> debugging pack-objects, if you just fed it the input anyway, but if you
> were debugging "git repack --geometric" it might be useful to see which
> packs it thought were which (though arguably that would be a useful
> trace in builtin/repack.c instead).

I could see an argument in both ways. I'd rather pass for now until we
have a clearer need for it.

> [passing --unpacked to the namehash traversal]
>
> I'm OK to consider that an implementation detail for now, though. We can
> change it later without impacting the interface.

Agreed.

> > +		if (rev_list_unpacked)
> > +			add_unreachable_loose_objects();
>
> Despite the name, that function is adding both reachable and unreachable
> ones. So it is doing what you want. It might be worth renaming, but it's
> not too big a deal since it's local to this file.

Yeah, I tend to err on the side of "it's fine as-is" since this isn't
exposed outside of pack-objects internals. If you feel strongly I'm
happy to change it, but I suspect you don't.

Thanks,
Taylor

  reply	other threads:[~2021-02-17 19:00 UTC|newest]

Thread overview: 120+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-19 23:23 [PATCH 00/10] repack: support repacking into a geometric sequence Taylor Blau
2021-01-19 23:24 ` [PATCH 01/10] packfile: introduce 'find_kept_pack_entry()' Taylor Blau
2021-01-20 13:40   ` Derrick Stolee
2021-01-20 14:38     ` Taylor Blau
2021-01-29  2:33   ` Junio C Hamano
2021-01-29 18:38     ` Taylor Blau
2021-01-29 19:31     ` Jeff King
2021-01-29 20:20       ` Junio C Hamano
2021-01-19 23:24 ` [PATCH 02/10] revision: learn '--no-kept-objects' Taylor Blau
2021-01-29  3:10   ` Junio C Hamano
2021-01-29 19:13     ` Taylor Blau
2021-01-19 23:24 ` [PATCH 03/10] builtin/pack-objects.c: learn '--assume-kept-packs-closed' Taylor Blau
2021-01-29  3:21   ` Junio C Hamano
2021-01-29 19:19     ` Jeff King
2021-01-29 20:01       ` Taylor Blau
2021-01-29 20:25         ` Jeff King
2021-01-29 22:10           ` Taylor Blau
2021-01-29 22:57             ` Jeff King
2021-01-29 23:03             ` Junio C Hamano
2021-01-29 23:28               ` Taylor Blau
2021-02-02  3:04                 ` Taylor Blau
2021-01-29 23:31               ` Jeff King
2021-01-29 22:13           ` Junio C Hamano
2021-01-29 20:30       ` Junio C Hamano
2021-01-29 22:43         ` Jeff King
2021-01-29 22:53           ` Taylor Blau
2021-01-29 23:00             ` Jeff King
2021-01-29 23:10             ` Junio C Hamano
2021-01-19 23:24 ` [PATCH 04/10] p5303: add missing &&-chains Taylor Blau
2021-01-19 23:24 ` [PATCH 05/10] p5303: measure time to repack with keep Taylor Blau
2021-01-29  3:40   ` Junio C Hamano
2021-01-29 19:32     ` Jeff King
2021-01-29 20:04       ` [PATCH] p5303: avoid sed GNU-ism Jeff King
2021-01-29 20:19         ` Eric Sunshine
2021-01-29 20:27           ` Jeff King
2021-01-29 20:36             ` Eric Sunshine
2021-01-29 22:11               ` Taylor Blau
2021-01-29 20:38       ` [PATCH 05/10] p5303: measure time to repack with keep Junio C Hamano
2021-01-29 22:10         ` Jeff King
2021-01-29 23:12           ` Junio C Hamano
2021-01-19 23:24 ` [PATCH 06/10] pack-objects: rewrite honor-pack-keep logic Taylor Blau
2021-01-19 23:24 ` [PATCH 07/10] packfile: add kept-pack cache for find_kept_pack_entry() Taylor Blau
2021-01-19 23:24 ` [PATCH 08/10] builtin/pack-objects.c: teach '--keep-pack-stdin' Taylor Blau
2021-01-19 23:24 ` [PATCH 09/10] builtin/repack.c: extract loose object handling Taylor Blau
2021-01-20 13:59   ` Derrick Stolee
2021-01-20 14:34     ` Taylor Blau
2021-01-20 15:51       ` Derrick Stolee
2021-01-21  3:45     ` Junio C Hamano
2021-01-19 23:24 ` [PATCH 10/10] builtin/repack.c: add '--geometric' option Taylor Blau
2021-01-20 14:05 ` [PATCH 00/10] repack: support repacking into a geometric sequence Derrick Stolee
2021-02-04  3:58 ` [PATCH v2 0/8] " Taylor Blau
2021-02-04  3:58   ` [PATCH v2 1/8] packfile: introduce 'find_kept_pack_entry()' Taylor Blau
2021-02-16 21:42     ` Jeff King
2021-02-16 21:48       ` Taylor Blau
2021-02-04  3:58   ` [PATCH v2 2/8] revision: learn '--no-kept-objects' Taylor Blau
2021-02-16 23:17     ` Jeff King
2021-02-17 18:35       ` Taylor Blau
2021-02-04  3:59   ` [PATCH v2 3/8] builtin/pack-objects.c: add '--stdin-packs' option Taylor Blau
2021-02-16 23:46     ` Jeff King
2021-02-17 18:59       ` Taylor Blau [this message]
2021-02-17 19:21         ` Jeff King
2021-02-04  3:59   ` [PATCH v2 4/8] p5303: add missing &&-chains Taylor Blau
2021-02-04  3:59   ` [PATCH v2 5/8] p5303: measure time to repack with keep Taylor Blau
2021-02-16 23:58     ` Jeff King
2021-02-17  0:02       ` Jeff King
2021-02-17 19:13       ` Taylor Blau
2021-02-17 19:25         ` Jeff King
2021-02-04  3:59   ` [PATCH v2 6/8] builtin/pack-objects.c: rewrite honor-pack-keep logic Taylor Blau
2021-02-17 16:05     ` Jeff King
2021-02-17 19:23       ` Taylor Blau
2021-02-17 19:29         ` Jeff King
2021-02-04  3:59   ` [PATCH v2 7/8] packfile: add kept-pack cache for find_kept_pack_entry() Taylor Blau
2021-02-17 17:11     ` Jeff King
2021-02-17 19:54       ` Taylor Blau
2021-02-17 20:25         ` Jeff King
2021-02-17 20:29           ` Taylor Blau
2021-02-17 21:43             ` Jeff King
2021-02-04  3:59   ` [PATCH v2 8/8] builtin/repack.c: add '--geometric' option Taylor Blau
2021-02-17 18:17     ` Jeff King
2021-02-17 20:01       ` Taylor Blau
2021-02-17  0:01   ` [PATCH v2 0/8] repack: support repacking into a geometric sequence Jeff King
2021-02-17 18:18     ` Jeff King
2021-02-18  3:14 ` [PATCH v3 " Taylor Blau
2021-02-18  3:14   ` [PATCH v3 1/8] packfile: introduce 'find_kept_pack_entry()' Taylor Blau
2021-02-18  3:14   ` [PATCH v3 2/8] revision: learn '--no-kept-objects' Taylor Blau
2021-02-18  3:14   ` [PATCH v3 3/8] builtin/pack-objects.c: add '--stdin-packs' option Taylor Blau
2021-02-18  3:14   ` [PATCH v3 4/8] p5303: add missing &&-chains Taylor Blau
2021-02-18  3:14   ` [PATCH v3 5/8] p5303: measure time to repack with keep Taylor Blau
2021-02-18  3:14   ` [PATCH v3 6/8] builtin/pack-objects.c: rewrite honor-pack-keep logic Taylor Blau
2021-02-18  3:14   ` [PATCH v3 7/8] packfile: add kept-pack cache for find_kept_pack_entry() Taylor Blau
2021-02-18  3:14   ` [PATCH v3 8/8] builtin/repack.c: add '--geometric' option Taylor Blau
2021-02-23  0:31   ` [PATCH v3 0/8] repack: support repacking into a geometric sequence Jeff King
2021-02-23  1:06     ` Taylor Blau
2021-02-23  1:42       ` Jeff King
2021-02-23  2:24 ` [PATCH v4 " Taylor Blau
2021-02-23  2:25   ` [PATCH v4 1/8] packfile: introduce 'find_kept_pack_entry()' Taylor Blau
2021-02-23  2:25   ` [PATCH v4 2/8] revision: learn '--no-kept-objects' Taylor Blau
2021-02-23  2:25   ` [PATCH v4 3/8] builtin/pack-objects.c: add '--stdin-packs' option Taylor Blau
2021-02-23  8:07     ` Junio C Hamano
2021-02-23 18:51       ` Jeff King
2021-02-23  2:25   ` [PATCH v4 4/8] p5303: add missing &&-chains Taylor Blau
2021-02-23  2:25   ` [PATCH v4 5/8] p5303: measure time to repack with keep Taylor Blau
2021-02-23  2:25   ` [PATCH v4 6/8] builtin/pack-objects.c: rewrite honor-pack-keep logic Taylor Blau
2021-02-23  2:25   ` [PATCH v4 7/8] packfile: add kept-pack cache for find_kept_pack_entry() Taylor Blau
2021-02-23  2:25   ` [PATCH v4 8/8] builtin/repack.c: add '--geometric' option Taylor Blau
2021-02-24 23:19     ` Junio C Hamano
2021-02-24 23:43       ` Junio C Hamano
2021-03-04 21:40         ` Taylor Blau
2021-03-04 21:55       ` Taylor Blau
2021-02-23  3:39   ` [PATCH v4 0/8] repack: support repacking into a geometric sequence Jeff King
2021-02-23  7:43   ` Junio C Hamano
2021-02-23 18:44     ` Jeff King
2021-02-23 19:54       ` Martin Fick
2021-02-23 20:06         ` Taylor Blau
2021-02-23 21:57           ` Martin Fick
2021-02-23 20:15         ` Jeff King
2021-02-23 21:41           ` Martin Fick
2021-02-23 21:53             ` Jeff King
2021-02-24 18:13               ` Martin Fick
2021-02-26  6:23                 ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YC1nfH356wfmAKE2@nand.local \
    --to=me@ttaylorr.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).