From: Taylor Blau <me@ttaylorr.com>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, dstolee@microsoft.com, gitster@pobox.com
Subject: Re: [PATCH v2 3/8] builtin/pack-objects.c: add '--stdin-packs' option
Date: Wed, 17 Feb 2021 13:59:08 -0500 [thread overview]
Message-ID: <YC1nfH356wfmAKE2@nand.local> (raw)
In-Reply-To: <YCxZcz5JGtxObOF3@coredump.intra.peff.net>
On Tue, Feb 16, 2021 at 06:46:59PM -0500, Jeff King wrote:
> Do we use this partial traversal to impact the write order at all? That
> would be a nice-to-have, but I suspect that just concatenating the packs
> (presumably by descending mtime) ends up with a similar result.
We don't; the objects are written in pack order. In the version of the
patch you reviewed, the order of packs was determined by their hash (due
to the string_list_sort()), but the version I just prepared re-sorts by
mtime.
It's kind of gross, since we need to use QSORT directly on the
string_list internals in order to have access to the ->util field of the
string_list_items (string_list_sort() only lets you compare strings
directly for obvious reasons).
I added a comment describing this hack.
> > +--stdin-packs::
> > + Read the basenames of packfiles from the standard input, instead
> > + of object names or revision arguments. The resulting pack
> > + contains all objects listed in the included packs (those not
> > + beginning with `^`), excluding any objects listed in the
> > + excluded packs (beginning with `^`).
> > ++
> > +Incompatible with `--revs`, or options that imply `--revs` (such as
> > +`--all`), with the exception of `--unpacked`, which is compatible.
>
> I know you say "basename" here, but I wonder if it is worth giving an
> example (`pack-1234abcd.pack`) to make it clear in what form we expect
> it. Or possibly something in the `EXAMPLES` section.
Good idea, thanks.
> > --- a/builtin/pack-objects.c
> > +++ b/builtin/pack-objects.c
> > @@ -2979,6 +2979,164 @@ static int git_pack_config(const char *k, const char *v, void *cb)
> > return git_default_config(k, v, cb);
> > }
> >
> > +static int stdin_packs_found_nr;
> > +static int stdin_packs_hints_nr;
>
> I scratched my head at these until I looked further in the code. They're
> the counters for the trace output. Might be worth a brief comment above
> them. (I do approve of adding this kind of trace debugging info; I'm
> pretty accustomed to using gdb or adding one-off debug statements, but
> we really could do a better job in general of making these kinds of
> internals visible to mere mortal admins).
Good call.
> > +static int add_object_entry_from_pack(const struct object_id *oid,
> > + struct packed_git *p,
> > + uint32_t pos,
> > + void *_data)
> > +{
> > + struct rev_info *revs = _data;
> > + struct object_info oi = OBJECT_INFO_INIT;
> > + off_t ofs;
> > + enum object_type type;
> > +
> > + display_progress(progress_state, ++nr_seen);
> > +
> > + ofs = nth_packed_object_offset(p, pos);
> > +
> > + oi.typep = &type;
> > + if (packed_object_info(the_repository, p, ofs, &oi) < 0)
> > + die(_("could not get type of object %s in pack %s"),
> > + oid_to_hex(oid), p->pack_name);
>
> Calling out for other reviewers: the oi.typep field will be filled in
> the with _real_ type of the object, even if it's a delta. This is as
> opposed to the return value of packed_object_info(), which may be
> OFS_DELTA or REF_DELTA.
>
> And that real type is what we want here:
>
> > + else if (type == OBJ_COMMIT) {
> > + /*
> > + * commits in included packs are used as starting points for the
> > + * subsequent revision walk
> > + */
> > + add_pending_oid(revs, NULL, oid, 0);
> > + }
>
> And later when we call create_object_entry().
:-). Yes indeed. As I'm sure that you will recall, the pack-objects
code _does not_ behave well when you give it the packed type of an
object (which is not entirely unexpected, since the pack-objects code
only operates on the true type, so passing the packed type--as I did
when originally writing this patch--is a bug).
> I wondered whether it would be worth adding other objects we might find,
> like trees, in order to increase our traversal. But that doesn't make
> any sense. The whole point is to find the paths, which come from
> traversing from the root trees. And we can only find the root trees by
> starting at commits. Adding any random tree we found would defeat the
> purpose (most of them are sub-trees and would give us a useless partial
> path).
Right.
> Should we avoid adding the commit as a tip for walking if it won't end
> up in the resulting pack? I.e., should we check these:
>
> > + if (have_duplicate_entry(oid, 0))
> > + return 0;
> > +
> > + if (!want_object_in_pack(oid, 0, &p, &ofs))
> > + return 0;
>
> ...first? I guess it probably doesn't matter too much since we'd
> truncate the traversal as soon as we saw it was in a kept pack anyway.
I agree it doesn't make a difference, but I think placing the extra
guards first makes it easier to read (since the reader doesn't have to
consider how the subsequent traversal would treat it).
> > +static void show_commit_pack_hint(struct commit *commit, void *_data)
> > +{
> > +}
>
> Nothing to do here, since commits don't have a name field. Makes sense.
Yeah. I added a comment to say the same thing, just for extra clarity.
>
> > +static void show_object_pack_hint(struct object *object, const char *name,
> > + void *_data)
> > +{
> > + struct object_entry *oe = packlist_find(&to_pack, &object->oid);
> > + if (!oe)
> > + return;
> > +
> > + /*
> > + * Our 'to_pack' list was constructed by iterating all objects packed in
> > + * included packs, and so doesn't have a non-zero hash field that you
> > + * would typically pick up during a reachability traversal.
> > + *
> > + * Make a best-effort attempt to fill in the ->hash and ->no_try_delta
> > + * here using a now in order to perhaps improve the delta selection
> > + * process.
> > + */
> > + oe->hash = pack_name_hash(name);
> > + oe->no_try_delta = name && no_try_delta(name);
> > +
> > + stdin_packs_hints_nr++;
> > +}
>
> But for actual objects, we do fill in the hash. I wonder if it's
> possible for oe->hash to have been already filled. I don't think it
> really matters, though. Any value we get is equally valid, so
> overwriting is OK in that case.
Right.
> > + trace2_data_intmax("pack-objects", the_repository, "stdin_packs_found",
> > + stdin_packs_found_nr);
>
> I wonder if it makes sense to report the actual set of packs via trace
> (obviously not as an int, but as a list). That's less helpful for
> debugging pack-objects, if you just fed it the input anyway, but if you
> were debugging "git repack --geometric" it might be useful to see which
> packs it thought were which (though arguably that would be a useful
> trace in builtin/repack.c instead).
I could see an argument in both ways. I'd rather pass for now until we
have a clearer need for it.
> [passing --unpacked to the namehash traversal]
>
> I'm OK to consider that an implementation detail for now, though. We can
> change it later without impacting the interface.
Agreed.
> > + if (rev_list_unpacked)
> > + add_unreachable_loose_objects();
>
> Despite the name, that function is adding both reachable and unreachable
> ones. So it is doing what you want. It might be worth renaming, but it's
> not too big a deal since it's local to this file.
Yeah, I tend to err on the side of "it's fine as-is" since this isn't
exposed outside of pack-objects internals. If you feel strongly I'm
happy to change it, but I suspect you don't.
Thanks,
Taylor
next prev parent reply other threads:[~2021-02-17 19:00 UTC|newest]
Thread overview: 120+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-19 23:23 [PATCH 00/10] repack: support repacking into a geometric sequence Taylor Blau
2021-01-19 23:24 ` [PATCH 01/10] packfile: introduce 'find_kept_pack_entry()' Taylor Blau
2021-01-20 13:40 ` Derrick Stolee
2021-01-20 14:38 ` Taylor Blau
2021-01-29 2:33 ` Junio C Hamano
2021-01-29 18:38 ` Taylor Blau
2021-01-29 19:31 ` Jeff King
2021-01-29 20:20 ` Junio C Hamano
2021-01-19 23:24 ` [PATCH 02/10] revision: learn '--no-kept-objects' Taylor Blau
2021-01-29 3:10 ` Junio C Hamano
2021-01-29 19:13 ` Taylor Blau
2021-01-19 23:24 ` [PATCH 03/10] builtin/pack-objects.c: learn '--assume-kept-packs-closed' Taylor Blau
2021-01-29 3:21 ` Junio C Hamano
2021-01-29 19:19 ` Jeff King
2021-01-29 20:01 ` Taylor Blau
2021-01-29 20:25 ` Jeff King
2021-01-29 22:10 ` Taylor Blau
2021-01-29 22:57 ` Jeff King
2021-01-29 23:03 ` Junio C Hamano
2021-01-29 23:28 ` Taylor Blau
2021-02-02 3:04 ` Taylor Blau
2021-01-29 23:31 ` Jeff King
2021-01-29 22:13 ` Junio C Hamano
2021-01-29 20:30 ` Junio C Hamano
2021-01-29 22:43 ` Jeff King
2021-01-29 22:53 ` Taylor Blau
2021-01-29 23:00 ` Jeff King
2021-01-29 23:10 ` Junio C Hamano
2021-01-19 23:24 ` [PATCH 04/10] p5303: add missing &&-chains Taylor Blau
2021-01-19 23:24 ` [PATCH 05/10] p5303: measure time to repack with keep Taylor Blau
2021-01-29 3:40 ` Junio C Hamano
2021-01-29 19:32 ` Jeff King
2021-01-29 20:04 ` [PATCH] p5303: avoid sed GNU-ism Jeff King
2021-01-29 20:19 ` Eric Sunshine
2021-01-29 20:27 ` Jeff King
2021-01-29 20:36 ` Eric Sunshine
2021-01-29 22:11 ` Taylor Blau
2021-01-29 20:38 ` [PATCH 05/10] p5303: measure time to repack with keep Junio C Hamano
2021-01-29 22:10 ` Jeff King
2021-01-29 23:12 ` Junio C Hamano
2021-01-19 23:24 ` [PATCH 06/10] pack-objects: rewrite honor-pack-keep logic Taylor Blau
2021-01-19 23:24 ` [PATCH 07/10] packfile: add kept-pack cache for find_kept_pack_entry() Taylor Blau
2021-01-19 23:24 ` [PATCH 08/10] builtin/pack-objects.c: teach '--keep-pack-stdin' Taylor Blau
2021-01-19 23:24 ` [PATCH 09/10] builtin/repack.c: extract loose object handling Taylor Blau
2021-01-20 13:59 ` Derrick Stolee
2021-01-20 14:34 ` Taylor Blau
2021-01-20 15:51 ` Derrick Stolee
2021-01-21 3:45 ` Junio C Hamano
2021-01-19 23:24 ` [PATCH 10/10] builtin/repack.c: add '--geometric' option Taylor Blau
2021-01-20 14:05 ` [PATCH 00/10] repack: support repacking into a geometric sequence Derrick Stolee
2021-02-04 3:58 ` [PATCH v2 0/8] " Taylor Blau
2021-02-04 3:58 ` [PATCH v2 1/8] packfile: introduce 'find_kept_pack_entry()' Taylor Blau
2021-02-16 21:42 ` Jeff King
2021-02-16 21:48 ` Taylor Blau
2021-02-04 3:58 ` [PATCH v2 2/8] revision: learn '--no-kept-objects' Taylor Blau
2021-02-16 23:17 ` Jeff King
2021-02-17 18:35 ` Taylor Blau
2021-02-04 3:59 ` [PATCH v2 3/8] builtin/pack-objects.c: add '--stdin-packs' option Taylor Blau
2021-02-16 23:46 ` Jeff King
2021-02-17 18:59 ` Taylor Blau [this message]
2021-02-17 19:21 ` Jeff King
2021-02-04 3:59 ` [PATCH v2 4/8] p5303: add missing &&-chains Taylor Blau
2021-02-04 3:59 ` [PATCH v2 5/8] p5303: measure time to repack with keep Taylor Blau
2021-02-16 23:58 ` Jeff King
2021-02-17 0:02 ` Jeff King
2021-02-17 19:13 ` Taylor Blau
2021-02-17 19:25 ` Jeff King
2021-02-04 3:59 ` [PATCH v2 6/8] builtin/pack-objects.c: rewrite honor-pack-keep logic Taylor Blau
2021-02-17 16:05 ` Jeff King
2021-02-17 19:23 ` Taylor Blau
2021-02-17 19:29 ` Jeff King
2021-02-04 3:59 ` [PATCH v2 7/8] packfile: add kept-pack cache for find_kept_pack_entry() Taylor Blau
2021-02-17 17:11 ` Jeff King
2021-02-17 19:54 ` Taylor Blau
2021-02-17 20:25 ` Jeff King
2021-02-17 20:29 ` Taylor Blau
2021-02-17 21:43 ` Jeff King
2021-02-04 3:59 ` [PATCH v2 8/8] builtin/repack.c: add '--geometric' option Taylor Blau
2021-02-17 18:17 ` Jeff King
2021-02-17 20:01 ` Taylor Blau
2021-02-17 0:01 ` [PATCH v2 0/8] repack: support repacking into a geometric sequence Jeff King
2021-02-17 18:18 ` Jeff King
2021-02-18 3:14 ` [PATCH v3 " Taylor Blau
2021-02-18 3:14 ` [PATCH v3 1/8] packfile: introduce 'find_kept_pack_entry()' Taylor Blau
2021-02-18 3:14 ` [PATCH v3 2/8] revision: learn '--no-kept-objects' Taylor Blau
2021-02-18 3:14 ` [PATCH v3 3/8] builtin/pack-objects.c: add '--stdin-packs' option Taylor Blau
2021-02-18 3:14 ` [PATCH v3 4/8] p5303: add missing &&-chains Taylor Blau
2021-02-18 3:14 ` [PATCH v3 5/8] p5303: measure time to repack with keep Taylor Blau
2021-02-18 3:14 ` [PATCH v3 6/8] builtin/pack-objects.c: rewrite honor-pack-keep logic Taylor Blau
2021-02-18 3:14 ` [PATCH v3 7/8] packfile: add kept-pack cache for find_kept_pack_entry() Taylor Blau
2021-02-18 3:14 ` [PATCH v3 8/8] builtin/repack.c: add '--geometric' option Taylor Blau
2021-02-23 0:31 ` [PATCH v3 0/8] repack: support repacking into a geometric sequence Jeff King
2021-02-23 1:06 ` Taylor Blau
2021-02-23 1:42 ` Jeff King
2021-02-23 2:24 ` [PATCH v4 " Taylor Blau
2021-02-23 2:25 ` [PATCH v4 1/8] packfile: introduce 'find_kept_pack_entry()' Taylor Blau
2021-02-23 2:25 ` [PATCH v4 2/8] revision: learn '--no-kept-objects' Taylor Blau
2021-02-23 2:25 ` [PATCH v4 3/8] builtin/pack-objects.c: add '--stdin-packs' option Taylor Blau
2021-02-23 8:07 ` Junio C Hamano
2021-02-23 18:51 ` Jeff King
2021-02-23 2:25 ` [PATCH v4 4/8] p5303: add missing &&-chains Taylor Blau
2021-02-23 2:25 ` [PATCH v4 5/8] p5303: measure time to repack with keep Taylor Blau
2021-02-23 2:25 ` [PATCH v4 6/8] builtin/pack-objects.c: rewrite honor-pack-keep logic Taylor Blau
2021-02-23 2:25 ` [PATCH v4 7/8] packfile: add kept-pack cache for find_kept_pack_entry() Taylor Blau
2021-02-23 2:25 ` [PATCH v4 8/8] builtin/repack.c: add '--geometric' option Taylor Blau
2021-02-24 23:19 ` Junio C Hamano
2021-02-24 23:43 ` Junio C Hamano
2021-03-04 21:40 ` Taylor Blau
2021-03-04 21:55 ` Taylor Blau
2021-02-23 3:39 ` [PATCH v4 0/8] repack: support repacking into a geometric sequence Jeff King
2021-02-23 7:43 ` Junio C Hamano
2021-02-23 18:44 ` Jeff King
2021-02-23 19:54 ` Martin Fick
2021-02-23 20:06 ` Taylor Blau
2021-02-23 21:57 ` Martin Fick
2021-02-23 20:15 ` Jeff King
2021-02-23 21:41 ` Martin Fick
2021-02-23 21:53 ` Jeff King
2021-02-24 18:13 ` Martin Fick
2021-02-26 6:23 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YC1nfH356wfmAKE2@nand.local \
--to=me@ttaylorr.com \
--cc=dstolee@microsoft.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).