From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_PASS, SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by dcvr.yhbt.net (Postfix) with ESMTP id 65D5B1F5AF for ; Tue, 30 Mar 2021 07:12:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230323AbhC3HLz (ORCPT ); Tue, 30 Mar 2021 03:11:55 -0400 Received: from cloud.peff.net ([104.130.231.41]:36732 "EHLO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230248AbhC3HLu (ORCPT ); Tue, 30 Mar 2021 03:11:50 -0400 Received: (qmail 18076 invoked by uid 109); 30 Mar 2021 07:11:49 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Tue, 30 Mar 2021 07:11:49 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 22011 invoked by uid 111); 30 Mar 2021 07:11:50 -0000 Received: from coredump.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Tue, 30 Mar 2021 03:11:50 -0400 Authentication-Results: peff.net; auth=none Date: Tue, 30 Mar 2021 03:11:48 -0400 From: Jeff King To: Taylor Blau Cc: git@vger.kernel.org, avarab@gmail.com, dstolee@microsoft.com, gitster@pobox.com, jonathantanmy@google.com Subject: Re: [PATCH v3 08/16] midx: allow marking a pack as preferred Message-ID: References: <30194a6786bec51e0f41de0e6c855dc2297806c6.1615482270.git.me@ttaylorr.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Mon, Mar 29, 2021 at 05:15:12PM -0400, Taylor Blau wrote: > There are two solutions to the problem: > > - You could write the mtimes in the MIDX itself. This would give you a > single point of reference, and resolve the TOCTOU race I just > described. > > - Or, you could forget about mtimes entirely and let the MIDX dictate > the pack ordering itself. That resolves the race in a > similar-but-different way. > > Of the two, I prefer the latter, but I think it introduces functionality > that we don't necessarily need yet. Yeah, I'd strongly favor the latter over the former. The reason to go with the solution you have in this series is that it doesn't require changing anything in the on-disk midx format, and we think it is good enough. But once we are going to change the on-disk format, we might as well give the writing side as much flexibility as possible. Of course the mtimes themselves are really just numbers, so in a sense the two are really equivalent. ;) > That's because the objects within > the packs are still ordered as such, and so the compression we get in > the packs is just as good as it is for single-pack bitmaps. It's only at > the objects between pack boundaries that any runs of 1s or 0s might be > interrupted, but there are far fewer pack boundaries than objects, so it > doesn't seem to matter in practice. Right. The absolute worst case is a large number of single-object packs, in which case the bitmap order becomes essentially random with respect to history (because it would be sorted by sha1 of the packs). The effect _might_ be measurable in more real-world cases, like say one big pack and 100 pushes each with a handful of commits. The big pack would be in good shape, but you have a lot of extra pack boundaries that hurt the bitmap compression. But in practice, generating bitmaps is expensive enough that you'd probably want to roll up some of the packs anyway (and that is certainly what we are doing at GitHub, using your "repack --geometric"). So you'd end usually with one big pack representing most of history, and then a handful of roll-up packs. So I'm a little curious whether one could even measure the impact of, say, 100 little packs. But not enough to even run the experiment, because even that is not a case that is really that interesting. > Anyway, I think that you know all of that already (mostly because we > thought aloud together when I originally brought this up), but I figure > that this detail may be interesting for other readers, too. Indeed. And I know that you know everything I just wrote, but I agree it's nice to get a record of these discussions onto the list. :) -Peff