From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_PASS, SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by dcvr.yhbt.net (Postfix) with ESMTP id A2DE21F9FC for ; Fri, 26 Mar 2021 07:44:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229589AbhCZHnq (ORCPT ); Fri, 26 Mar 2021 03:43:46 -0400 Received: from cloud.peff.net ([104.130.231.41]:49614 "EHLO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229671AbhCZHno (ORCPT ); Fri, 26 Mar 2021 03:43:44 -0400 Received: (qmail 28933 invoked by uid 109); 26 Mar 2021 07:43:43 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Fri, 26 Mar 2021 07:43:43 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 8821 invoked by uid 111); 26 Mar 2021 07:43:45 -0000 Received: from coredump.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Fri, 26 Mar 2021 03:43:45 -0400 Authentication-Results: peff.net; auth=none Date: Fri, 26 Mar 2021 03:43:43 -0400 From: Jeff King To: Han-Wen Nienhuys Cc: git Subject: Re: Distinguishing FF vs non-FF updates in the reflog? Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Mon, Mar 22, 2021 at 03:40:46PM +0100, Han-Wen Nienhuys wrote: > > I left some numbers in another part of the thread, but IMHO performance > > isn't that compelling a reason to do this these days, if you are using > > commit-graphs. > > > > Just walking the reflog might be _slightly_ faster, though not > > necessarily (it depends on whether the depth of the object graph or the > > depth of the reflog chain is deeper). It might matter more if you are > > using a more exotic storage scheme, where switching from accessing > > reflogs to objects implies extra round-trips to a server (e.g., custom > > storage backends with JGit; I don't know the state of the art in what > > Google is doing there). > > JGit doesn't currently support commit-graph, so it's hard to predict > what performance will be like, but isn't commit-graph is keyed by > SHA1? That makes it hard to do caching, especially when considering > large repositories. Yes, it's keyed by sha1. It's essentially replacing "inflate the commit object and parse it" with "here are the parsed values as mmap-able 32-bit integer fields" (there's some other stuff with generation numbers, too, but the main speedup is simply that accessing each commit is orders of magnitude cheaper). It caches well, because those properties of the commit are immutable. But if you meant "when pulling data from the commit-graph file, is it friendly to block cache", then no, it's not linear. You'd binary search within it to find each commit, just as you would a pack .idx (and just like a .idx, I'd expect a system that is pulling data from a network source to want to grab the whole commit-graph file. They tend to be much smaller than the main .idx for a given repo). > AFAIU, commit-graph would help speed up reachability checks, by being > able to shortcut cases where the commit number proves that some commit > is not ancestor of the other, but you still have to do a revwalk to > conclusively prove reachability. Right. You'll still walk a lot of the commits, but you'll do so much faster (the generation numbers can also help prune some uninteresting side paths, but again, I think the main value for this operation is just getting the parent info much faster). -Peff