git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, "Jeff King" <peff@peff.net>,
	"Felipe Contreras" <felipe.contreras@gmail.com>,
	"SZEDER Gábor" <szeder.dev@gmail.com>,
	"Chris Torek" <chris.torek@gmail.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Taylor Blau" <me@ttaylorr.com>
Subject: Re: [PATCH v3 3/4] revision: avoid loading object headers multiple times
Date: Fri, 6 Aug 2021 16:17:08 +0200	[thread overview]
Message-ID: <YQ1EZAylyI4pAC0E@ncase> (raw)
In-Reply-To: <YQkHUZKZ6gkFkaD0@ncase>

[-- Attachment #1: Type: text/plain, Size: 3077 bytes --]

On Tue, Aug 03, 2021 at 11:07:29AM +0200, Patrick Steinhardt wrote:
> On Mon, Aug 02, 2021 at 12:40:56PM -0700, Junio C Hamano wrote:
> > Patrick Steinhardt <ps@pks.im> writes:
> > 
> > > When loading references, we try to optimize loading of commits by using
> > > the commit graph. To do so, we first need to determine whether the
> > > object actually is a commit or not, which is why we always execute
> > > `oid_object_info()` first. Like this, we'll unpack the object header of
> > > each object first.
> > >
> > > This pattern can be quite inefficient in case many references point to
> > > the same commit: if the object didn't end up in the cached objects, then
> > > we'll repeatedly unpack the same object header, even if we've already
> > > seen the object before.
> > > ...
> > > Assuming that in almost all repositories, most references will point to
> > > either a tag or a commit, we'd have a modest increase in memory
> > > consumption of about 12.5% here.
> > 
> > I wonder if we can also say almost all repositories, the majority of
> > refs point at the same object.  If that holds, this would certainly
> > be a win, but otherwise, it is not so clear.
> 
> I doubt that's the case in general. I rather assume that it's typically
> going to be a smallish subset that points to the same commit, but for
> these cases we at least avoid doing the lookup multiple times. As I
> said, it's definitely a tradeoff between memory and performance: in the
> worst case (all references point to different blobs) we allocate 33%
> more memory without having any speedups. A more realistic scenario would
> probably be something like a trunk-based development repo, where there's
> a single branch only and the rest is tags. There we'd allocate 11% more
> memory without any speedups. In general, it's going to be various shades
> of gray, where we allocate something from 0% to 11% more memory while
> getting some modest speedups in some cases.
> 
> So if we only inspect this commit as a standalone it's definitely
> debatable whether we'd want to take it or not. But one important thing
> is that it's a prerequisite for patch 4/4: in order to not parse commits
> in case they're part of the commit-graph, we need to first obtain an
> object such that we can fill it in via the graph. So we have to call
> `lookup_unknown_object()` anyway. Might be sensible to document this as
> part of the commit message.

Scratch that: we can just rewrite this to use `lookup_object()` to check
whether we already know the object, and then we can rewrite the
`parse_commit_in_graph_gently()` to be `lookup_commit_in_graph()`, which
does a `lookup_commit()` to create the object in case the OID was found
in the graph. That's also got nicer calling semantics..

It's negligibly slower (3.021s instead of 2.983s), but it doesn't need
me arguing about the performance/memory tradeoff. Which is a good thing
I guess: there will always be that one repo that's completely different
and where such assumptions don't hold.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2021-08-06 14:17 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-28  5:33 [PATCH v2 0/3] Speed up connectivity checks via bitmaps Patrick Steinhardt
2021-06-28  5:33 ` [PATCH v2 1/3] p5400: add perf tests for git-receive-pack(1) Patrick Steinhardt
2021-06-28  7:49   ` Ævar Arnfjörð Bjarmason
2021-06-29  6:18     ` Patrick Steinhardt
2021-06-29 12:09       ` Ævar Arnfjörð Bjarmason
2021-06-28  5:33 ` [PATCH v2 2/3] receive-pack: skip connectivity checks on delete-only commands Patrick Steinhardt
2021-06-28  8:00   ` Ævar Arnfjörð Bjarmason
2021-06-28  8:06     ` Ævar Arnfjörð Bjarmason
2021-06-29  6:26     ` Patrick Steinhardt
2021-06-30  1:31   ` Jeff King
2021-06-30  1:35     ` Jeff King
2021-06-30 13:52     ` Patrick Steinhardt
2021-06-28  5:33 ` [PATCH v2 3/3] connected: implement connectivity check using bitmaps Patrick Steinhardt
2021-06-28 20:23   ` Taylor Blau
2021-06-29 22:44     ` Taylor Blau
2021-06-30  2:04       ` Jeff King
2021-06-30  3:07         ` Taylor Blau
2021-06-30  5:45           ` Jeff King
2021-07-02 17:44             ` Taylor Blau
2021-07-02 21:21               ` Jeff King
2021-06-30  1:51   ` Jeff King
2021-07-20 14:26     ` Patrick Steinhardt
2021-08-02  9:37 ` [PATCH v3 0/4] Speed up connectivity checks Patrick Steinhardt
2021-08-02  9:38   ` [PATCH v3 1/4] connected: do not sort input revisions Patrick Steinhardt
2021-08-02 12:49     ` Ævar Arnfjörð Bjarmason
2021-08-03  8:50       ` Patrick Steinhardt
2021-08-04 11:01         ` Ævar Arnfjörð Bjarmason
2021-08-02 19:00     ` Junio C Hamano
2021-08-03  8:55       ` Patrick Steinhardt
2021-08-03 21:47         ` Junio C Hamano
2021-08-02  9:38   ` [PATCH v3 2/4] revision: stop retrieving reference twice Patrick Steinhardt
2021-08-02 12:53     ` Ævar Arnfjörð Bjarmason
2021-08-02  9:38   ` [PATCH v3 3/4] revision: avoid loading object headers multiple times Patrick Steinhardt
2021-08-02 12:55     ` Ævar Arnfjörð Bjarmason
2021-08-05 10:12       ` Patrick Steinhardt
2021-08-02 19:40     ` Junio C Hamano
2021-08-03  9:07       ` Patrick Steinhardt
2021-08-06 14:17         ` Patrick Steinhardt [this message]
2021-08-02  9:38   ` [PATCH v3 4/4] revision: avoid hitting packfiles when commits are in commit-graph Patrick Steinhardt
2021-08-02 20:01     ` Junio C Hamano
2021-08-03  9:16       ` Patrick Steinhardt
2021-08-03 21:56         ` Junio C Hamano
2021-08-05 11:01           ` Patrick Steinhardt
2021-08-05 16:16             ` Junio C Hamano
2021-08-04 10:51         ` Ævar Arnfjörð Bjarmason
2021-08-05 11:25   ` [PATCH v4 0/6] Speed up connectivity checks Patrick Steinhardt
2021-08-05 11:25     ` [PATCH v4 1/6] revision: separate walk and unsorted flags Patrick Steinhardt
2021-08-05 18:47       ` Junio C Hamano
2021-08-05 11:25     ` [PATCH v4 2/6] connected: do not sort input revisions Patrick Steinhardt
2021-08-05 18:44       ` Junio C Hamano
2021-08-06  6:00         ` Patrick Steinhardt
2021-08-06 16:50           ` Junio C Hamano
2021-08-05 11:25     ` [PATCH v4 3/6] revision: stop retrieving reference twice Patrick Steinhardt
2021-08-05 11:25     ` [PATCH v4 4/6] revision: avoid loading object headers multiple times Patrick Steinhardt
2021-08-05 11:25     ` [PATCH v4 5/6] commit-graph: split out function to search commit position Patrick Steinhardt
2021-08-05 11:25     ` [PATCH v4 6/6] revision: avoid hitting packfiles when commits are in commit-graph Patrick Steinhardt
2021-08-09  8:00 ` [PATCH v5 0/5] Speed up connectivity checks Patrick Steinhardt
2021-08-09  8:02   ` Patrick Steinhardt
2021-08-09  8:11 ` Patrick Steinhardt
2021-08-09  8:11   ` [PATCH v5 1/5] revision: separate walk and unsorted flags Patrick Steinhardt
2021-08-09  8:11   ` [PATCH v5 2/5] connected: do not sort input revisions Patrick Steinhardt
2021-08-09  8:11   ` [PATCH v5 3/5] revision: stop retrieving reference twice Patrick Steinhardt
2021-08-09  8:11   ` [PATCH v5 4/5] commit-graph: split out function to search commit position Patrick Steinhardt
2021-08-09  8:12   ` [PATCH v5 5/5] revision: avoid hitting packfiles when commits are in commit-graph Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YQ1EZAylyI4pAC0E@ncase \
    --to=ps@pks.im \
    --cc=avarab@gmail.com \
    --cc=chris.torek@gmail.com \
    --cc=felipe.contreras@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).