From: Jeff King <peff@peff.net>
To: "SZEDER Gábor" <szeder.dev@gmail.com>
Cc: Taylor Blau <me@ttaylorr.com>,
git@vger.kernel.org, dstolee@microsoft.com
Subject: looking up object types quickly, was Re: [PATCH 1/1] commit-graph.c: avoid unnecessary tag dereference when merging
Date: Sun, 22 Mar 2020 14:45:19 -0400 [thread overview]
Message-ID: <20200322184519.GA654322@coredump.intra.peff.net> (raw)
In-Reply-To: <20200322110424.GC2224@szeder.dev>
On Sun, Mar 22, 2020 at 12:04:24PM +0100, SZEDER Gábor wrote:
> > > Where the last step is taking all commits listed in any pack, which is
> > > cheap to iterate.
> >
> > I'm not sure it's all that cheap. It has to find the type of every
> > object in every pack. And finding types involves walking delta chains.
>
> Indeed, that's just about the most expensive way to find all commits,
> in my experience it can be well over a magnitude slower than
> '--reachable'.
They don't necessarily produce the same set of commits, of course, but
for the purposes of "commit-graph write" I think they can generally be
considered interchangeable.
The speed depends on a couple of factors. One is just the makeup of the
repository: tree size, how many are deltas, etc. But a big one is
whether commit-graphs are already enabled. :)
On linux.git without commit-graphs, both of these take about 7.5s for
me:
git cat-file --batch-all-objects --unordered --batch-check='%(objecttype) %(objectname)'
git rev-list --all
But with graphs, the latter takes about 700ms.
I was coincidentally thinking about this problem ("show me all commits")
for something unrelated recently, and I do think that cat-file
invocation could be sped up in some cases. Here are a few ideas:
1. If the pack has a .bitmap file, it contains 4 "type" bitmaps that you
can use to quickly determine the type of a given object. But:
- the current code for loading bitmaps files is slower than we'd
need; it indexes all of the commit bitmaps, but we wouldn't use
them. So we'd want something that just loads the type bitmaps and
stops.
- you have to know the bit position of the object you're interested
in, which implies computing the revidx. We could provide a "bit
position" cache in the bitmap file (basically one uint32_t for each
object giving the packfile order position, but stored in pack .idx
order).
Neither is too hard a problem, but it's clearly not just a small
patch to consult the bitmap.
It could be patched directly into oid_object_info_extended(), which
could do individual type lookups this way. But it would be faster
still if callers could say "I'm only interested in commits", in which
case we could just quickly walk the commit-type bitmap in one pass
(to get the commits in just that packfile, and then go through the
other objects the old-fashioned way).
2. The commit-graph file can quickly tell us whether an object is a
commit. But unfortunately it can give us a definite "yes", but not a
definite "no" (because it might be a commit that just isn't (yet) in
the graph file). So we'd have to find the real type of any
non-commits, and I think that's where we spend the bulk of our time
anyway.
3. To find the type of a delta object, we have to walk back to a base
object with a concrete type. When we access the actual object
contents, we cache intermediate bases to avoid digging down the same
chain over and over. But I don't think there's any such cache for the
type lookup. So if you have a 50-deep delta chain and want the type
of each object, you'll walk down 49 links, then 48 links, and so on.
It probably wouldn't be that hard to have a small type cache (or just
allow the existing cache to have entries for "I know the type, but
don't have the contents").
I suspect (1) would give the biggest speedup, but is more work and
requires bitmaps. I think (3) is likely to give a moderate win and
probably isn't that big a patch.
-Peff
next prev parent reply other threads:[~2020-03-22 18:45 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-03-21 3:44 [PATCH 0/1] commit-graph: avoid unnecessary tag deference when merging Taylor Blau
2020-03-21 3:44 ` [PATCH 1/1] commit-graph.c: avoid unnecessary tag dereference " Taylor Blau
2020-03-21 5:00 ` Jeff King
2020-03-21 6:11 ` Taylor Blau
2020-03-21 6:24 ` Taylor Blau
2020-03-21 7:03 ` Jeff King
2020-03-21 17:27 ` Taylor Blau
2020-03-22 5:36 ` Jeff King
2020-03-22 11:04 ` SZEDER Gábor
2020-03-22 18:45 ` Jeff King [this message]
2020-03-22 19:18 ` looking up object types quickly, was " Jeff King
2020-03-23 20:15 ` Taylor Blau
2020-03-22 16:45 ` Taylor Blau
2020-03-24 6:06 ` Jeff King
2020-03-21 18:50 ` Junio C Hamano
2020-03-22 0:03 ` Derrick Stolee
2020-03-22 0:20 ` Taylor Blau
2020-03-22 0:23 ` Derrick Stolee
2020-03-22 5:49 ` Jeff King
2020-03-22 6:04 ` Jeff King
2020-03-22 15:47 ` Taylor Blau
2020-03-24 6:11 ` Jeff King
2020-03-24 23:08 ` Taylor Blau
2020-03-27 8:42 ` Jeff King
2020-03-27 15:03 ` Taylor Blau
2020-03-22 15:44 ` Taylor Blau
2020-03-24 6:14 ` Jeff King
2020-03-21 5:01 ` Junio C Hamano
2020-03-21 4:56 ` [PATCH 0/1] commit-graph: avoid unnecessary tag deference " Junio C Hamano
2020-03-21 5:04 ` Jeff King
2020-03-21 6:12 ` Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200322184519.GA654322@coredump.intra.peff.net \
--to=peff@peff.net \
--cc=dstolee@microsoft.com \
--cc=git@vger.kernel.org \
--cc=me@ttaylorr.com \
--cc=szeder.dev@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).