From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_PASS, SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by dcvr.yhbt.net (Postfix) with ESMTP id EBF481F5AF for ; Tue, 16 Jun 2020 12:58:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728942AbgFPM6f (ORCPT ); Tue, 16 Jun 2020 08:58:35 -0400 Received: from cloud.peff.net ([104.130.231.41]:33060 "EHLO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728553AbgFPM6f (ORCPT ); Tue, 16 Jun 2020 08:58:35 -0400 Received: (qmail 21677 invoked by uid 109); 16 Jun 2020 12:58:34 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Tue, 16 Jun 2020 12:58:34 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 12892 invoked by uid 111); 16 Jun 2020 12:58:34 -0000 Received: from coredump.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Tue, 16 Jun 2020 08:58:34 -0400 Authentication-Results: peff.net; auth=none Date: Tue, 16 Jun 2020 08:58:33 -0400 From: Jeff King To: Junio C Hamano via GitGitGadget Cc: git@vger.kernel.org, don@goodman-wilson.com, stolee@gmail.com, sandals@crustytoothpaste.net, Matt Rogers , Eric Sunshine , Taylor Blau , Phillip Wood , Alban Gruin , Johannes Sixt , Johannes Schindelin , Junio C Hamano Subject: Re: [PATCH v2 01/12] fast-export: do anonymize the primary branch name Message-ID: <20200616125833.GE666057@coredump.intra.peff.net> References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Mon, Jun 15, 2020 at 12:50:05PM +0000, Junio C Hamano via GitGitGadget wrote: > There is a comment that explains why it is OK to leave 'master' > unanonymized (because everybody calls the primary branch 'master' > and it is no secret), but that does not justify why it is bad to > anonymize 'master' and make it indistinguishable from other > branches. Assuming there _is_ a need to allow the readers of the > output to tell where the tip of the primary branch is, let's keep > the special casing of 'master', but still anonymize it to "ref0". > Because all other branches will be given ref+N where N is a positive > integer, this will keep the primary branch identifiable in the > output stream, without exposing what the name of the primary branch > is in the repository the export stream was taken from. I think this is fine. The reason I left "master" as-is in the original is that it is potentially helpful to have an idea of its specialness when reproducing a traversal in the anonymized. I.e., if you know that a bug is shown by "git rev-list master~17..master~3", then you can reproduce it with the same command in the anonymized repo. Losing any idea of where the primary branch is would make that impossible. But with this patch, you can swap it out for "ref0~17", etc, which is OK. Of course that only helps you for _one_ branch. A more generally useful mechanism would be to teach fast-export to write the ref mapping (and perhaps file mappings, etc) to a separate file. Then you could convert any reproduction recipe to use the anonymized names, and share only that recipe along with the anonymized dump. But that's _way_ outside the scope of your series. This seems like a good interim step to retain the status quo. -Peff