git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Han-Wen Nienhuys <hanwen@google.com>
Cc: Martin Fick <mfick@codeaurora.org>, git <git@vger.kernel.org>
Subject: Re: Distinguishing FF vs non-FF updates in the reflog?
Date: Thu, 18 Mar 2021 15:35:10 -0400	[thread overview]
Message-ID: <YFOrbjeunnVfQNRC@coredump.intra.peff.net> (raw)
In-Reply-To: <CAFQ2z_MavgAGDyJzc9-+j6zTDODP7hCdPHtB5dyx-reLMSLX3Q@mail.gmail.com>

On Thu, Mar 18, 2021 at 09:58:56AM +0100, Han-Wen Nienhuys wrote:

> > 1) Not all updates make it to the reflogs
> > 2) Reflogs can be edited or mucked with
> > 3) On NFS reflogs can outright be wrong even when used properly as their are
> > caching issues. We specifically have seen entries that appear to be FFs that
> > were not.
> 
> Can you tell a little more about 3) ? SInce we don't annotate non-FF
> vs FF today, what does "appear to be FFs" mean?
> 
> But you are right: since the reflog for a branch is in a different
> file from the branch head, there is no way to do an update to both of
> them at the same time. I guess this will have to be a reftable-only
> feature.

Each individual reflog entry (in the branch reflog and the HEAD reflog)
should still be consistent, though. They give the "before" and "after"
object ids, and the ff-ness is an immutable property of those commit
ids.

> > I believe that today git can do very fast reachability checks without opening
> > pack files by using some of its indexes (bitmap code or https://git-scm.com/
> > docs/commit-graph ?). It probably makes sense to add this ability to jgit if
> > that is what you need?
> 
> The bitmaps are generated by GC, and you can't GC all the time. JGit
> has support for bitmaps, and its support actually predates C-Git's
> support for it. (It was added to JGit by Colby Ranger who worked in
> Shawn's team).

Bitmaps can help with these checks, but we don't actually look at them
in most of the algorithms one might use for computing ancestry. One of
the reasons for that is that they often backfire as an optimization,
because:

  - as you note, they are often not up to date because they require a
    repack. So they won't help when asking about very recently added
    commits (which people tend to ask about more than ancient ones).

  - the bitmap file format doesn't have any index. So a reader has to
    scan the whole thing upon opening to decide which commits have
    bitmaps.

For several years we had a patch at GitHub that checked for bitmaps
during "--contains" traversals. Even though it did sometimes backfire,
it was enough of a net win to be worth keeping, compared to actually
opening commit objects to follow their parent pointers. But with
commit-graphs, it was a strict loss, and we stopped using it entirely
last year. (We do still look at bitmaps for our branch ahead/behind
checks using a custom patch; I'm suspicious of its performance for the
same reasons, but we haven't dug carefully into it).

But...

> I expect that the commit graph doesn't work for my intended use-case.

...I think commit-graphs are a big win here. They are more often kept up
to date, because they can be generated incrementally with effort
proportional to the number of new commits. And they make a big
difference if the traversal has to cover a lot of commits. E.g., here's
the most extreme case in git.git, checking ancestry of the oldest
commit:

  $ time git merge-base --is-ancestor e83c5163316f89bfbde7d9ab23ca2e25604af290 HEAD; echo $?

  real	0m0.014s
  user	0m0.008s
  sys	0m0.005s
  0

  $ time git -c core.commitgraph=false merge-base --is-ancestor e83c5163316f89bfbde7d9ab23ca2e25604af290 HEAD; echo $?

  real	0m0.398s
  user	0m0.369s
  sys	0m0.028s
  0

Of course most results won't be so dramatic, because they wouldn't have
to traverse many commits in the first place (so they are already pretty
fast with or without the commit-graph).  But that 14ms should be an
upper bound for this repo. And naturally that scales with the number of
commits; in linux.git it's 43ms, compared to 8.7s without commit-graphs).

-Peff

  reply	other threads:[~2021-03-18 19:35 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-17 20:06 Distinguishing FF vs non-FF updates in the reflog? Han-Wen Nienhuys
2021-03-17 21:21 ` Martin Fick
2021-03-18  8:58   ` Han-Wen Nienhuys
2021-03-18 19:35     ` Jeff King [this message]
2021-03-18 22:24     ` Martin Fick
2021-03-22 12:31       ` Han-Wen Nienhuys
2021-03-22 17:45         ` Martin Fick
2021-03-18 22:31     ` Martin Fick
2021-03-18 22:54       ` Jeff King
2021-03-18 19:47 ` Jeff King
2021-03-22 14:40   ` Han-Wen Nienhuys
2021-03-26  7:43     ` Jeff King
2021-03-22 13:26 ` Ævar Arnfjörð Bjarmason
2021-03-22 14:59   ` Han-Wen Nienhuys
2021-03-22 15:39     ` Ævar Arnfjörð Bjarmason
2021-03-22 15:56       ` Han-Wen Nienhuys
2021-03-22 16:40         ` Ævar Arnfjörð Bjarmason
2021-03-22 17:12           ` Han-Wen Nienhuys
2021-03-22 18:36           ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YFOrbjeunnVfQNRC@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=hanwen@google.com \
    --cc=mfick@codeaurora.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).