From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS53758 23.128.96.0/24 X-Spam-Status: No, score=-4.2 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_DNSWL_LOW, SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by dcvr.yhbt.net (Postfix) with ESMTP id 3D63B1F5AE for ; Fri, 18 Jun 2021 14:10:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234242AbhFROMM (ORCPT ); Fri, 18 Jun 2021 10:12:12 -0400 Received: from cloud.peff.net ([104.130.231.41]:60026 "EHLO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229782AbhFROMK (ORCPT ); Fri, 18 Jun 2021 10:12:10 -0400 Received: (qmail 26923 invoked by uid 109); 18 Jun 2021 14:10:01 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Fri, 18 Jun 2021 14:10:01 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 15863 invoked by uid 111); 18 Jun 2021 14:10:00 -0000 Received: from coredump.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Fri, 18 Jun 2021 10:10:00 -0400 Authentication-Results: peff.net; auth=none Date: Fri, 18 Jun 2021 10:10:00 -0400 From: Jeff King To: Patrick Steinhardt Cc: Derrick Stolee , git@vger.kernel.org Subject: Re: [PATCH] bitmaps: don't recurse into trees already in the bitmap Message-ID: References: <471cb9be-bb72-6a37-ede8-f9421d9d3ebe@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Fri, Jun 18, 2021 at 03:35:05PM +0200, Patrick Steinhardt wrote: > > More than once I've spent many head-scratching hours trying to determine > > why some real-world repo performs better or worse than expected. :) > > I couldn't agree more. I've also had my fair share of weird performance > characteristics in completely unexpected ways. Unfortunately, it has > made me become quite cautious about bitmaps given that they've already > caused their fair share of perfomance regressions. > > But your work here actually encourages me to give it another try soonish > and see what kind of repo shapes make them explode this time. Not really for immediate work, but I just wanted to note while I'm thinking about it: the two big things that can actually make bitmaps _worse_ than a regular traversal are: - loading the bitmaps is inherently linear in the file size. We could do better with an mmap-able index of bitmapped commits in the file. This would be easy to add as an extension to the file. This matters most for really small queries (the absolute time to parse the bitmap file isn't that high, but if the non-bitmap query can be answered in a few milliseconds, it becomes relatively more important). - we produce a definite answer of reachability for the bitmaps by filling in a bitmap for the negative (UNINTERESTING) tips, then one for the regular tips, and then taking a set difference. Whereas a normal traversal walks both sides together until it hits a merge base, and then actually looks into the trees in the boundary commits. This normal-traversal technique may look at fewer objects overall for some cases (especially if you have a lot of negative tips, like with "--not --all" and not-so-good bitmap coverage). But it does mean we may report an object as in the traversal when it's actually not (if it was added/removed in part of the uninteresting history, and so not present at the boundary). We could possibly teach the bitmap traversal to behave more like a normal one. I suspect this would clear up most of the regressions you'd see using bitmaps with rev-list. There is one other related case I've run into. If you put a bitmap on every commit, you'd _think_ that would make everything faster, since you'd never traverse at all. But there's an inflection point where you spend more time going through each bitmap twiddling bits (which remember is O(nr_of_objects * nr_bitmapped_commits) bits across all of them). And most of those bits will already be "1", so there's a point where the saved traversal is less than the cost of OR-ing bits. Anyway, this is all big-picture stuff that I don't expect you to work on or even respond to. I'm mostly just writing up some thoughts that I don't think have made it to the list in a coherent form. I'm not sure if I'll eventually work on some of them or not (but anybody who is interested can be my guest :) ). -Peff