From: Jeff King <peff@peff.net>
To: Eric Sunshine <sunshine@sunshineco.com>
Cc: Git List <git@vger.kernel.org>,
Junio C Hamano <gitster@pobox.com>,
Johannes Schindelin <Johannes.Schindelin@gmx.de>
Subject: Re: [PATCH 09/10] fast-export: allow seeding the anonymized mapping
Date: Wed, 24 Jun 2020 11:47:40 -0400 [thread overview]
Message-ID: <20200624154740.GA2088459@coredump.intra.peff.net> (raw)
In-Reply-To: <CAPig+cTFkAOyLG1Sm_p11GgH9Ms87_7zs-7kFbEYZ-uXg1yrYw@mail.gmail.com>
On Tue, Jun 23, 2020 at 04:30:23PM -0400, Eric Sunshine wrote:
> > I'm not sure what you'd write, then. You can't mention "mybranch"
> > anymore if it was anonymized. Are you suggesting to make the example:
> >
> > git rev-list -- foo.c
> >
> > by itself?
>
> Sorry, I meant to provide an example like this:
>
> For example, if you have a bug which reproduces with `git rev-list
> sensitive -- secret.c`, you can run:
>
> $ git fast-export --anonymize --all \
> --seed-anonymized=sensitive:foo \
> --seed-anonymized=secret.c:bar.c \
> >stream
>
> After importing the stream, you can then run `git rev-list foo --
> bar.c` in the anonymized repository.
Thanks, that makes sense. I took this as-is for my reroll (modulo the
change of option name discussed elsewhere).
> Hmm, perhaps your original attempt can be extended slightly to state
> it more explicitly?
>
> Note that paths and refnames are split into tokens at slash
> boundaries. The command above would anonymize `subdir/foo.c` as
> something like `path123/secret.c`; you could then search for
> `secret.c` in the anonymized repository to determine the final
> pathname.
>
> To make referencing the final pathname simpler, you can seed
> anonymization for each path component; so, if you also anonymize
> `subdir` to `publicdir`, then the final pathname would be
> `publicdir/secret.c`.
Thanks, I took this modulo some fixups to match the example above, and
to avoid the use of the word "seed" based on our other discussion.
> This makes me wonder if --seed-anonymized should do its own
> tokenization so that --seed-anonymized=subdir/foo:public/bar is
> automatically understood as anonymizing "subdir" to "public" _and_
> "foo" to "bar". But that potentially gets weird if you say:
>
> --seed-anonymized=a/b:q/p --seed-anonymized=a/c:y/z
>
> in which case you've given conflicting replacements for "a". (I
> suppose it could issue a warning message in that case.)
Right, I think you get into weird corner cases. Another issue is that
not all items are tokenized (e.g., if your author name was foo/bar,
you'd want that replaced as a whole). Probably you could add both the
broken-down and full inputs. Yet another issue is that you can't add a
token with a ":" due to the syntax.
This is an infrequently-enough-used feature that I think it's worth
keeping things simple, even if they're a little less convenient to
invoke.
> Lack of a warning or error could be kind of bad if the person doesn't
> check the fast-export file before sending it out and only discovers
> later that:
>
> git fast-export --seed-anonymized=foo:bar
>
> didn't perform _any_ anonymization at all.
Good point. I'd hope people would glance at the output before sending it
out, but given that it's a potential safety issue, it probably is worth
detecting this case. I'll add it to my re-roll.
-Peff
next prev parent reply other threads:[~2020-06-24 15:47 UTC|newest]
Thread overview: 64+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-06-19 13:23 [PATCH 0/3] fast-export: allow dumping anonymization mappings Jeff King
2020-06-19 13:25 ` [PATCH 1/3] fast-export: allow dumping the refname mapping Jeff King
2020-06-19 15:51 ` Eric Sunshine
2020-06-19 16:01 ` Jeff King
2020-06-19 16:18 ` Eric Sunshine
2020-06-19 17:45 ` Jeff King
2020-06-19 18:00 ` Eric Sunshine
2020-06-22 21:30 ` Jeff King
2020-06-19 19:20 ` Junio C Hamano
2020-06-22 21:32 ` Jeff King
2020-06-19 13:26 ` [PATCH 2/3] fast-export: anonymize "master" refname Jeff King
2020-06-19 13:29 ` [PATCH 3/3] fast-export: allow dumping the path mapping Jeff King
2020-06-19 16:00 ` Eric Sunshine
2020-06-19 19:24 ` Junio C Hamano
2020-06-22 21:38 ` Jeff King
2020-06-19 13:51 ` [PATCH 0/3] fast-export: allow dumping anonymization mappings Johannes Schindelin
2020-06-22 16:35 ` Junio C Hamano
2020-06-22 21:47 ` [PATCH v2 0/4] " Jeff King
2020-06-22 21:47 ` [PATCH v2 1/4] fast-export: allow dumping the refname mapping Jeff King
2020-06-22 21:48 ` [PATCH v2 2/4] fast-export: anonymize "master" refname Jeff King
2020-06-22 21:48 ` [PATCH v2 3/4] fast-export: refactor path printing to not rely on stdout Jeff King
2020-06-22 21:48 ` [PATCH v2 4/4] fast-export: allow dumping the path mapping Jeff King
2020-06-23 15:24 ` [alternative 0/10] fast-export: allow seeding the anonymized mapping Jeff King
2020-06-23 15:24 ` [PATCH 01/10] t9351: derive anonymized tree checks from original repo Jeff King
2020-06-23 15:24 ` [PATCH 02/10] fast-export: use xmemdupz() for anonymizing oids Jeff King
2020-06-23 15:24 ` [PATCH 03/10] fast-export: store anonymized oids as hex strings Jeff King
2020-06-24 11:43 ` SZEDER Gábor
2020-06-24 15:54 ` Jeff King
2020-06-25 15:49 ` Jeff King
2020-06-25 20:45 ` SZEDER Gábor
2020-06-25 21:15 ` Jeff King
2020-06-29 13:17 ` Johannes Schindelin
2020-06-30 19:35 ` Jeff King
2020-06-23 15:24 ` [PATCH 04/10] fast-export: tighten anonymize_mem() interface to handle only strings Jeff King
2020-06-23 15:24 ` [PATCH 05/10] fast-export: stop storing lengths in anonymized hashmaps Jeff King
2020-06-23 15:24 ` [PATCH 06/10] fast-export: use a flex array to store anonymized entries Jeff King
2020-06-23 15:25 ` [PATCH 07/10] fast-export: move global "idents" anonymize hashmap into function Jeff King
2020-06-23 15:25 ` [PATCH 08/10] fast-export: add a "data" callback parameter to anonymize_str() Jeff King
2020-06-24 19:58 ` Junio C Hamano
2020-06-23 15:25 ` [PATCH 09/10] fast-export: allow seeding the anonymized mapping Jeff King
2020-06-23 17:16 ` Eric Sunshine
2020-06-23 18:30 ` Jeff King
2020-06-23 20:30 ` Eric Sunshine
2020-06-24 15:47 ` Jeff King [this message]
2020-06-23 18:11 ` Eric Sunshine
2020-06-23 18:35 ` Jeff King
2020-06-23 20:35 ` Eric Sunshine
2020-06-24 15:48 ` Jeff King
2020-06-23 15:25 ` [PATCH 10/10] fast-export: anonymize "master" refname Jeff King
2020-06-23 19:34 ` [alternative 0/10] fast-export: allow seeding the anonymized mapping Junio C Hamano
2020-06-23 19:44 ` Jeff King
2020-06-25 19:48 ` [PATCH v2 0/11] " Jeff King
2020-06-25 19:48 ` [PATCH v2 01/11] t9351: derive anonymized tree checks from original repo Jeff King
2020-06-25 19:48 ` [PATCH v2 02/11] fast-export: use xmemdupz() for anonymizing oids Jeff King
2020-06-25 19:48 ` [PATCH v2 03/11] fast-export: store anonymized oids as hex strings Jeff King
2020-06-25 19:48 ` [PATCH v2 04/11] fast-export: tighten anonymize_mem() interface to handle only strings Jeff King
2020-06-25 19:48 ` [PATCH v2 05/11] fast-export: stop storing lengths in anonymized hashmaps Jeff King
2020-06-25 19:48 ` [PATCH v2 06/11] fast-export: use a flex array to store anonymized entries Jeff King
2020-06-25 19:48 ` [PATCH v2 07/11] fast-export: move global "idents" anonymize hashmap into function Jeff King
2020-06-25 19:48 ` [PATCH v2 08/11] fast-export: add a "data" callback parameter to anonymize_str() Jeff King
2020-06-25 19:48 ` [PATCH v2 09/11] fast-export: allow seeding the anonymized mapping Jeff King
2020-06-25 19:48 ` [PATCH v2 10/11] fast-export: anonymize "master" refname Jeff King
2020-06-25 19:48 ` [PATCH v2 11/11] fast-export: use local array to store anonymized oid Jeff King
2020-06-25 21:22 ` [PATCH v2 0/11] fast-export: allow seeding the anonymized mapping Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200624154740.GA2088459@coredump.intra.peff.net \
--to=peff@peff.net \
--cc=Johannes.Schindelin@gmx.de \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=sunshine@sunshineco.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).