From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_PASS, SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by dcvr.yhbt.net (Postfix) with ESMTP id 33B891F934 for ; Fri, 29 Jan 2021 23:01:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232863AbhA2XBT (ORCPT ); Fri, 29 Jan 2021 18:01:19 -0500 Received: from cloud.peff.net ([104.130.231.41]:41038 "EHLO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231296AbhA2XBM (ORCPT ); Fri, 29 Jan 2021 18:01:12 -0500 Received: (qmail 4525 invoked by uid 109); 29 Jan 2021 23:00:08 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Fri, 29 Jan 2021 23:00:08 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 10555 invoked by uid 111); 29 Jan 2021 23:00:31 -0000 Received: from coredump.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Fri, 29 Jan 2021 18:00:31 -0500 Authentication-Results: peff.net; auth=none Date: Fri, 29 Jan 2021 18:00:30 -0500 From: Jeff King To: Taylor Blau Cc: Junio C Hamano , git@vger.kernel.org, dstolee@microsoft.com Subject: Re: [PATCH 03/10] builtin/pack-objects.c: learn '--assume-kept-packs-closed' Message-ID: References: <2da42e9ca26c9ef914b8b044047d505f00a27e20.1611098616.git.me@ttaylorr.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Fri, Jan 29, 2021 at 05:53:58PM -0500, Taylor Blau wrote: > > I'm still thinking aloud here, and not really sure which is a better > > path. I do feel like the failure modes for the second one are less > > risky. > > The more I think about it, the more I feel that the second option is the > right approach. It seems like if you were naïvely implementing this from > scratch, that you'd pick the second one (i.e., have pack-objects > understand a new input mode, and then make a pack based on that). > > I am leery that we'd be able to get the first option "right" without > attaching some sort of marker to each pack, especially given how > difficult I think that this is to reason about precisely. I suppose you > could have a .closed file corresponding to each pack, or alternatively a > $objdir/pack/pack-geometry file which specifies the same thing, but both > of these feel overly restrictive. Yeah, I think my gut feeling matches yours. > Besides having to special case the loose objects, is there any downside > to doing the simpler thing here? The other downside I can think of is that you can't just run "git repack --geometric" every time, and eventually get a good result (or one that asymptotically approaches good ;) ). I.e., you now have two types of repacks: quick and dirty rollups, and "real" ones that do reachability. So you need some heuristics about how often you do one versus the other. I'm definitely OK with that outcome. And I think we could even bake those heuristics into a script or mode of repack (e.g., maybe "gc --auto" would trigger a bigger repack every N times or something). But that's what I came up with by brainstorming. :) -Peff