From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_PASS, SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by dcvr.yhbt.net (Postfix) with ESMTP id 9AFE51F9FC for ; Thu, 18 Mar 2021 19:35:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232985AbhCRTfT (ORCPT ); Thu, 18 Mar 2021 15:35:19 -0400 Received: from cloud.peff.net ([104.130.231.41]:41394 "EHLO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232979AbhCRTfM (ORCPT ); Thu, 18 Mar 2021 15:35:12 -0400 Received: (qmail 693 invoked by uid 109); 18 Mar 2021 19:35:11 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Thu, 18 Mar 2021 19:35:11 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 24186 invoked by uid 111); 18 Mar 2021 19:35:12 -0000 Received: from coredump.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Thu, 18 Mar 2021 15:35:12 -0400 Authentication-Results: peff.net; auth=none Date: Thu, 18 Mar 2021 15:35:10 -0400 From: Jeff King To: Han-Wen Nienhuys Cc: Martin Fick , git Subject: Re: Distinguishing FF vs non-FF updates in the reflog? Message-ID: References: <5359503.g8GvsOHjsp@mfick-lnx> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Thu, Mar 18, 2021 at 09:58:56AM +0100, Han-Wen Nienhuys wrote: > > 1) Not all updates make it to the reflogs > > 2) Reflogs can be edited or mucked with > > 3) On NFS reflogs can outright be wrong even when used properly as their are > > caching issues. We specifically have seen entries that appear to be FFs that > > were not. > > Can you tell a little more about 3) ? SInce we don't annotate non-FF > vs FF today, what does "appear to be FFs" mean? > > But you are right: since the reflog for a branch is in a different > file from the branch head, there is no way to do an update to both of > them at the same time. I guess this will have to be a reftable-only > feature. Each individual reflog entry (in the branch reflog and the HEAD reflog) should still be consistent, though. They give the "before" and "after" object ids, and the ff-ness is an immutable property of those commit ids. > > I believe that today git can do very fast reachability checks without opening > > pack files by using some of its indexes (bitmap code or https://git-scm.com/ > > docs/commit-graph ?). It probably makes sense to add this ability to jgit if > > that is what you need? > > The bitmaps are generated by GC, and you can't GC all the time. JGit > has support for bitmaps, and its support actually predates C-Git's > support for it. (It was added to JGit by Colby Ranger who worked in > Shawn's team). Bitmaps can help with these checks, but we don't actually look at them in most of the algorithms one might use for computing ancestry. One of the reasons for that is that they often backfire as an optimization, because: - as you note, they are often not up to date because they require a repack. So they won't help when asking about very recently added commits (which people tend to ask about more than ancient ones). - the bitmap file format doesn't have any index. So a reader has to scan the whole thing upon opening to decide which commits have bitmaps. For several years we had a patch at GitHub that checked for bitmaps during "--contains" traversals. Even though it did sometimes backfire, it was enough of a net win to be worth keeping, compared to actually opening commit objects to follow their parent pointers. But with commit-graphs, it was a strict loss, and we stopped using it entirely last year. (We do still look at bitmaps for our branch ahead/behind checks using a custom patch; I'm suspicious of its performance for the same reasons, but we haven't dug carefully into it). But... > I expect that the commit graph doesn't work for my intended use-case. ...I think commit-graphs are a big win here. They are more often kept up to date, because they can be generated incrementally with effort proportional to the number of new commits. And they make a big difference if the traversal has to cover a lot of commits. E.g., here's the most extreme case in git.git, checking ancestry of the oldest commit: $ time git merge-base --is-ancestor e83c5163316f89bfbde7d9ab23ca2e25604af290 HEAD; echo $? real 0m0.014s user 0m0.008s sys 0m0.005s 0 $ time git -c core.commitgraph=false merge-base --is-ancestor e83c5163316f89bfbde7d9ab23ca2e25604af290 HEAD; echo $? real 0m0.398s user 0m0.369s sys 0m0.028s 0 Of course most results won't be so dramatic, because they wouldn't have to traverse many commits in the first place (so they are already pretty fast with or without the commit-graph). But that 14ms should be an upper bound for this repo. And naturally that scales with the number of commits; in linux.git it's 43ms, compared to 8.7s without commit-graphs). -Peff