From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: * X-Spam-ASN: AS3215 2.6.0.0/16 X-Spam-Status: No, score=1.3 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,LIST_MIRROR_RECEIVED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE shortcircuit=no autolearn=no autolearn_force=no version=3.4.2 Received: from out1.vger.email (out1.vger.email [IPv6:2620:137:e000::1:20]) by dcvr.yhbt.net (Postfix) with ESMTP id 858FC1F670 for ; Mon, 21 Feb 2022 04:51:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1343746AbiBUDMT (ORCPT ); Sun, 20 Feb 2022 22:12:19 -0500 Received: from mxb-00190b01.gslb.pphosted.com ([23.128.96.19]:57236 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236396AbiBUDMS (ORCPT ); Sun, 20 Feb 2022 22:12:18 -0500 Received: from mail-io1-xd31.google.com (mail-io1-xd31.google.com [IPv6:2607:f8b0:4864:20::d31]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CF9AB205F8 for ; Sun, 20 Feb 2022 19:11:56 -0800 (PST) Received: by mail-io1-xd31.google.com with SMTP id e79so14432417iof.13 for ; Sun, 20 Feb 2022 19:11:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20210112.gappssmtp.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=KPyiYxCegHZF4jLY1JDEu80glJkeiYNm//vrxnTqpwE=; b=fyObtL34+3TrOenulth7TKBvdwvAoYMxAvmYrpS1VvKCzREezdMPSF+X9IbFdmOo9p veMh9JL49EexbuH7Vu9cHilEKlRMe7/qa2DDc0jxyzYuZgCQgr63VIQ64AvwfKvxOCOw hg+g1Xix02Hq4PExLxjAD4oZgaNvF0A4KDkrioqka4NTQTcyC5rZB/0McqOq1oKHW2AJ NXcwnnwdwUzFAlIqcwCrilCKnu8QnLa1scXuujQic3AuameIhZ/b6SXWD82wvDpSu52s hCs0A9+R80TQqLIunux26T88wgYzAgxBPLMSEJJzILeVaddlOcmXiGV0WsT4/8RQP3Mt 550w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=KPyiYxCegHZF4jLY1JDEu80glJkeiYNm//vrxnTqpwE=; b=sQ7aAQoi2L7Rzzfy7XqKKyv/cH66BAyXIo461ALLOKRZG22vXXoS7I5hV+0fFQ/UDG 1HG9tMJFh+zYT59ZY33RxbKffHEauCTfdGn28kKOY/iXrdsuOZjFQt1FMM69Xr6U/ab2 7eQl+2fffFJqS0+5QW/Vzj3AZe1CdR6DtOC65Lr4z1yUZsV/SUNw2iwZoAayxnrnN5f7 2/RfIeUjwYbgHJZCRdt6BZcBJmaW2oSXmJxr/tQvxIOs4nri1imVpoyB1peVDJbchVjp gL2aYucECzLWpdKxUx2VnGSYgfcn+t/wzL1DHeGVCZa/cvE8hEhtZqG5SHRu8PqRulpk MuKg== X-Gm-Message-State: AOAM530QANQVWZFbN0eqP22iFpdkyXTx1gKomA8h3N+3KUj6R1WNp8+t L02/s8gYyR28xRfll9smDHKN0DpayCt2a2oa X-Google-Smtp-Source: ABdhPJzQoRe2GOXy20mvhC1y+lT95ER187SQrq8kVMakhSJ+sOzs/NbyztnvRsWlw9evEF5EDNkxTw== X-Received: by 2002:a5d:84c5:0:b0:60b:bd34:bb6f with SMTP id z5-20020a5d84c5000000b0060bbd34bb6fmr14289332ior.32.1645413116229; Sun, 20 Feb 2022 19:11:56 -0800 (PST) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id c13sm95874ilm.2.2022.02.20.19.11.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 20 Feb 2022 19:11:55 -0800 (PST) Date: Sun, 20 Feb 2022 22:11:55 -0500 From: Taylor Blau To: John Cai Cc: Robert Coup , John Cai via GitGitGadget , git@vger.kernel.org, Derrick Stolee , Christian Couder Subject: Re: [PATCH v2 0/4] [RFC] repack: add --filter= Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Wed, Feb 16, 2022 at 04:07:14PM -0500, John Cai wrote: > > I don't know whether that is just around naming (--delete-filter / > > --drop-filter / > > --expire-filter ?), and/or making the documentation very explicit that > > this isn't so > > much "omitting certain objects from a packfile" as irretrievably > > deleting objects. > > Yeah, making the name very clear (I kind of like --delete-filter) would certainly help. > Also, to have more protection we can either > > 1. add a config value that needs to be set to true for repack to remove > objects (repack.allowDestroyFilter). > > 2. --filter is dry-run by default and prints out objects that would have been removed, > and it has to be combined with another flag --destroy in order for it to actually remove > objects from the odb. I share the same concern as Robert and Stolee do. But I think this issue goes deeper than just naming. Even if we called this `git repack --delete-filter` and only ran it with `--i-know-what-im-doing` flag, we would still be leaving repository corruption on the table, just making it marginally more difficult to achieve. I'm not familiar enough with the proposal to comment authoritatively, but it seems like we should be verifying that there is a promisor remote which promises any objects that we are about to filter out of the repository. I think that this is basically what `pack-objects`'s `--missing=allow-promisor` does, though I don't think that's the right tool for this job, either. Because we pack-objects also knows the object filter, by the time we are ready to construct a pack, we're traversing the filtered list of objects. So we don't even bother to call show_object (or, in this case, builtin/pack-objects.c::show_objecT__ma_allow_promisor) on them. So I wonder what your thoughts are on having pack-objects only allow an object to get "filtered out" if a copy of it is promised by some promisor remote. Alternatively, and perhaps a more straight-forward option might be to have `git repack` look at any objects that exist in a pack we're about to delete, but don't exist in any of the packs we are going to leave around, and make sure that any of those objects are either unreachable or exist on a promisor remote. But as it stands right now, I worry that this feature is too easily misused and could result in unintended repository corruption. I think verifying that that any objects we're about to delete exist somewhere should make this safer to use, though even then, I think we're still open to a TOCTOU race whereby the promisor has the objects when we're about to delete them (convincing Git that deleting those objects is OK to do) but gets rid of them after objects have been deleted from the local copy (leaving no copies of the object around). So, I don't know exactly what the right path forward is. But I'm curious to get your thoughts on the above. Thanks, Taylor