git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: "SZEDER Gábor" <szeder.dev@gmail.com>
Cc: Rafael Silva <rafaeloliveira.cs@gmail.com>,
	Jonathan Tan <jonathantanmy@google.com>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: rather slow 'git repack' in 'blob:none' partial clones
Date: Wed, 14 Apr 2021 01:14:14 -0400	[thread overview]
Message-ID: <YHZ6JvLBNpZVOqiX@coredump.intra.peff.net> (raw)
In-Reply-To: <20210413180552.GI2947267@szeder.dev>

On Tue, Apr 13, 2021 at 08:05:52PM +0200, SZEDER Gábor wrote:

> >   [but now ask it to exclude promisor objects, and it's much slower;
> >   this is because is_promisor_object() opens up each tree in the pack in
> >   order to see which "promised" objects it mentions]
> 
> I don't understand this: 'git rev-list --count --all' only counts
> commit objects, so why should it open any trees at all?

Because the promisor code is a bit over-eager. There's actually one
small error in what I wrote above. In that particular rev-list, we're
not calling is_promisor_object() at all, because we'll already have
excluded all the promisor objects by marking them UNINTERESTING and
SEEN.

So:

  - in is_promisor_object(), we load the _whole_ list of promisor
    objects, which requires opening trees to find out about referenced
    blobs. In theory it could know that we only care about commits, but
    it's not connected to a particular traversal, so it gets the whole
    list.

  - mark_uninteresting() is the code where we pre-mark the objects from
    the promisor pack as UNINTERESTING, and that was loading all of the
    trees in this case. And it could know that we are not looking at
    --objects, so there's no need to touch trees. But after my patches,
    we do not load the contents of _any_ objects at all in that
    function. We could avoid even creating "struct object" for
    non-commits there, too, but that would imply looking up the type of
    each object (so more CPU, though it would save us some memory when
    we only care about commits). I suspect in practice that most callers
    would generally pass --objects anyway, though (e.g., your original
    pack-objects that started this thread certainly cares about
    non-commits).

> > +	/*
> > +	 * yikes, do we really need to parse here? maybe
> 
> Heh, a "yikes" here and a "yuck" in your previous patch...  This issue
> was worth reporting :)

Yeah. I think the client side of a lot of this partial-clone / promisor
stuff is not very mature. It's waiting on people to start using it and
finding all of these rough edges.

-Peff

  reply	other threads:[~2021-04-14  5:14 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-03  9:04 rather slow 'git repack' in 'blob:none' partial clones SZEDER Gábor
2021-04-05  1:02 ` Rafael Silva
2021-04-07 21:17   ` Jeff King
2021-04-08  0:02     ` Jonathan Tan
2021-04-08  0:35       ` Jeff King
2021-04-12  7:09     ` Rafael Silva
2021-04-12 21:36     ` SZEDER Gábor
2021-04-12 21:49       ` Bryan Turner
2021-04-12 23:51         ` Jeff King
2021-04-12 23:47       ` Jeff King
2021-04-13  7:12         ` [PATCH 0/3] low-hanging performance fruit with promisor packs Jeff King
2021-04-13  7:15           ` [PATCH 1/3] is_promisor_object(): free tree buffer after parsing Jeff King
2021-04-13 20:17             ` Junio C Hamano
2021-04-14  5:18               ` Jeff King
2021-04-13  7:16           ` [PATCH 2/3] lookup_unknown_object(): take a repository argument Jeff King
2021-04-13  7:17           ` [PATCH 3/3] revision: avoid parsing with --exclude-promisor-objects Jeff King
2021-04-13 20:22             ` Junio C Hamano
2021-04-13 18:10           ` [PATCH 0/3] low-hanging performance fruit with promisor packs SZEDER Gábor
2021-04-14 17:14           ` Jonathan Tan
2021-04-14 19:22           ` Rafael Silva
2021-04-13 18:05         ` rather slow 'git repack' in 'blob:none' partial clones SZEDER Gábor
2021-04-14  5:14           ` Jeff King [this message]
2021-04-11 10:59   ` SZEDER Gábor
2021-04-12  7:53     ` Rafael Silva
2021-04-14 19:14 ` [PATCH 0/2] prevent `repack` to unpack and delete promisor objects Rafael Silva
2021-04-14 19:14   ` [PATCH 1/2] repack: teach --no-prune-packed to skip `git prune-packed` Rafael Silva
2021-04-14 23:50     ` Jonathan Tan
2021-04-18 14:15       ` Rafael Silva
2021-04-14 19:14   ` [PATCH 2/2] repack: avoid loosening promisor pack objects in partial clones Rafael Silva
2021-04-15  1:04     ` Jonathan Tan
2021-04-15  3:51       ` Junio C Hamano
2021-04-15  9:03         ` Jeff King
2021-04-15  9:05       ` Jeff King
2021-04-18  7:12       ` Rafael Silva
2021-04-15 18:06     ` Junio C Hamano
2021-04-18  8:40       ` Rafael Silva
2021-04-14 22:10   ` [PATCH 0/2] prevent `repack` to unpack and delete promisor objects Junio C Hamano
2021-04-15  9:15   ` Jeff King
2021-04-18  8:20     ` Rafael Silva
2021-04-18 13:57   ` [PATCH v2 0/1] " Rafael Silva
2021-04-18 13:57     ` [PATCH v2 1/1] repack: avoid loosening promisor objects in partial clones Rafael Silva
2021-04-19 19:15       ` Jonathan Tan
2021-04-21 18:54         ` Rafael Silva
2021-04-19 23:09       ` Junio C Hamano
2021-04-21 19:25         ` Rafael Silva
2021-04-21 19:32     ` [PATCH v3] " Rafael Silva

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YHZ6JvLBNpZVOqiX@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=rafaeloliveira.cs@gmail.com \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).