From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-3.7 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD shortcircuit=no autolearn=ham autolearn_force=no version=3.4.0 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by dcvr.yhbt.net (Postfix) with ESMTP id 3EC302018E for ; Mon, 8 Aug 2016 16:51:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752152AbcHHQvb (ORCPT ); Mon, 8 Aug 2016 12:51:31 -0400 Received: from cloud.peff.net ([104.130.231.41]:51267 "HELO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1752123AbcHHQva (ORCPT ); Mon, 8 Aug 2016 12:51:30 -0400 Received: (qmail 16673 invoked by uid 109); 8 Aug 2016 16:51:30 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.84) with SMTP; Mon, 08 Aug 2016 16:51:30 +0000 Received: (qmail 6641 invoked by uid 111); 8 Aug 2016 16:51:29 -0000 Received: from Unknown (HELO sigill.intra.peff.net) (10.0.0.7) by peff.net (qpsmtpd/0.84) with SMTP; Mon, 08 Aug 2016 12:51:29 -0400 Received: by sigill.intra.peff.net (sSMTP sendmail emulation); Mon, 08 Aug 2016 12:51:28 -0400 Date: Mon, 8 Aug 2016 12:51:28 -0400 From: Jeff King To: Junio C Hamano Cc: git@vger.kernel.org, Michael Haggerty Subject: Re: [PATCH v2 7/7] pack-objects: use mru list when iterating over packs Message-ID: <20160808165127.fvhnkcfsj4vif7iu@sigill.intra.peff.net> References: <20160729040422.GA19678@sigill.intra.peff.net> <20160729041524.GG22408@sigill.intra.peff.net> <20160729054536.GA27343@sigill.intra.peff.net> <20160808145042.uwrk2m6jq3m4li37@sigill.intra.peff.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Mon, Aug 08, 2016 at 09:28:29AM -0700, Junio C Hamano wrote: > Jeff King writes: > > > Here's a list of approaches I think we can use to fix this: > > > > 1. squelch the warning and ignore it. The downside here, besides not > > warning the user about true in-pack cycles, is that we have no > > opportunity to actually find a new delta (because we realize the > > problem only after the delta-compression phase). > > > > My test repository is a bad packing of all of the forks of > > torvalds/linux, with 3600 packs. I'm happy to share it if anybody > > wants to see it, but note that it is 11GB. > > > > The space overhead of the resulting pack in this case is ~3.2% > > (versus a pack generated by the original code, using the static > > pack order). Which is really not that bad, but I'm sure there are > > more pathological cases (i.e., there were on the order of hundreds > > or maybe thousands of cycles that needed broken, out of about 15 > > million total objects; but one could imagine the worst case as > > nr_of_objects/2). > > ... > > > > So I dunno. I really like the MRU approach if we can salvage it. > > I think I share the same feeling. As long as the chance is small > enough that the pack reordering creates a new cycle, the resulting > pack would not become too bloated by the last-ditch cycle breaking > code and finding a replacement delta instead of inflating it may not > be worth the trouble. Let me see how complicated it is to do the cycle-detection earlier. It really shouldn't be that hard (we don't even need an auxiliary data structure; we can just smudge the index value like write_object does). I'm very curious to see the pack resulting pack size. > It worries me a lot to lose the warning unconditionally, though. > That's the (only) coal-mine canary that lets us notice a problem > when we actually start hitting that last-ditch cycle breaking too > often. The dedicated cycle-detection will lose that warning, too (it doesn't touch that code, but it's effectively checking the same thing earlier). I agree it's unfortunate to lose. On the other hand, it is being lost because we are correctly handling the cycles, and there is nothing to warn about. So it ceases to be a problem, and starts being a normal, acceptable thing. -Peff