git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Han-Wen Nienhuys <hanwen@google.com>
Cc: git <git@vger.kernel.org>
Subject: Re: Distinguishing FF vs non-FF updates in the reflog?
Date: Fri, 26 Mar 2021 03:43:43 -0400	[thread overview]
Message-ID: <YF2Qr/ClD4a3jCUQ@coredump.intra.peff.net> (raw)
In-Reply-To: <CAFQ2z_N8tCMZG62rNSY=HoRGuKnfk1W-Y_GOXz3SeaZO6=cWWA@mail.gmail.com>

On Mon, Mar 22, 2021 at 03:40:46PM +0100, Han-Wen Nienhuys wrote:

> > I left some numbers in another part of the thread, but IMHO performance
> > isn't that compelling a reason to do this these days, if you are using
> > commit-graphs.
> >
> > Just walking the reflog might be _slightly_ faster, though not
> > necessarily (it depends on whether the depth of the object graph or the
> > depth of the reflog chain is deeper). It might matter more if you are
> > using a more exotic storage scheme, where switching from accessing
> > reflogs to objects implies extra round-trips to a server (e.g., custom
> > storage backends with JGit; I don't know the state of the art in what
> > Google is doing there).
> 
> JGit doesn't currently support commit-graph, so it's hard to predict
> what performance will be like, but isn't commit-graph is keyed by
> SHA1? That makes it hard to do caching, especially when considering
> large repositories.

Yes, it's keyed by sha1. It's essentially replacing "inflate the commit
object and parse it" with "here are the parsed values as mmap-able
32-bit integer fields" (there's some other stuff with generation
numbers, too, but the main speedup is simply that accessing each commit
is orders of magnitude cheaper).

It caches well, because those properties of the commit are immutable.
But if you meant "when pulling data from the commit-graph file, is it
friendly to block cache", then no, it's not linear. You'd binary search
within it to find each commit, just as you would a pack .idx (and just
like a .idx, I'd expect a system that is pulling data from a network
source to want to grab the whole commit-graph file. They tend to be much
smaller than the main .idx for a given repo).

> AFAIU, commit-graph would help speed up reachability checks, by being
> able to shortcut cases where the commit number proves that some commit
> is not ancestor of the other, but you still have to do a revwalk to
> conclusively prove reachability.

Right. You'll still walk a lot of the commits, but you'll do so much
faster (the generation numbers can also help prune some uninteresting
side paths, but again, I think the main value for this operation is just
getting the parent info much faster).

-Peff

  reply	other threads:[~2021-03-26  7:44 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-17 20:06 Distinguishing FF vs non-FF updates in the reflog? Han-Wen Nienhuys
2021-03-17 21:21 ` Martin Fick
2021-03-18  8:58   ` Han-Wen Nienhuys
2021-03-18 19:35     ` Jeff King
2021-03-18 22:24     ` Martin Fick
2021-03-22 12:31       ` Han-Wen Nienhuys
2021-03-22 17:45         ` Martin Fick
2021-03-18 22:31     ` Martin Fick
2021-03-18 22:54       ` Jeff King
2021-03-18 19:47 ` Jeff King
2021-03-22 14:40   ` Han-Wen Nienhuys
2021-03-26  7:43     ` Jeff King [this message]
2021-03-22 13:26 ` Ævar Arnfjörð Bjarmason
2021-03-22 14:59   ` Han-Wen Nienhuys
2021-03-22 15:39     ` Ævar Arnfjörð Bjarmason
2021-03-22 15:56       ` Han-Wen Nienhuys
2021-03-22 16:40         ` Ævar Arnfjörð Bjarmason
2021-03-22 17:12           ` Han-Wen Nienhuys
2021-03-22 18:36           ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YF2Qr/ClD4a3jCUQ@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=hanwen@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).