From: Jeff King <peff@peff.net>
To: Duy Nguyen <pclouds@gmail.com>
Cc: Junio C Hamano <gitster@pobox.com>,
Git Mailing List <git@vger.kernel.org>
Subject: Re: [PATCH v3] teach fast-export an --anonymize option
Date: Thu, 28 Aug 2014 08:32:58 -0400 [thread overview]
Message-ID: <20140828123257.GA18642@peff.net> (raw)
In-Reply-To: <CACsJy8B3gFC7kLf-cLhAk3BgQ+v427rMXWHTqjU4LYP3NQte7Q@mail.gmail.com>
On Thu, Aug 28, 2014 at 05:30:44PM +0700, Duy Nguyen wrote:
> On Thu, Aug 28, 2014 at 12:01 AM, Jeff King <peff@peff.net> wrote:
> > You can get an overview of what will be shared
> > by running a command like:
> >
> > git fast-export --anonymize --all |
> > perl -pe 's/\d+/X/g' |
> > sort -u |
> > less
> >
> > which will show every unique line we generate, modulo any
> > numbers (each anonymized token is assigned a number, like
> > "User 0", and we replace it consistently in the output).
>
> I feel like this should be part of git-fast-export.txt, just to
> increase the user's confidence in the tool (and I don't expect most
> users to read this commit message).
Hmph. Whenever I say "I think this patch is done", suddenly the comments
start pouring in. :)
I think you are right, though, and we could stand to explain
the feature a little more in the documentation in general.
How about this patch on top (or squashed in):
-- >8 --
Subject: docs/fast-export: explain --anonymize more completely
The original commit made mention of this option, but not why
one might want it or how they might use it. Let's try to be
a little more thorough, and also explain how to confirm that
the output really is anonymous.
Signed-off-by: Jeff King <peff@peff.net>
---
Documentation/git-fast-export.txt | 63 ++++++++++++++++++++++++++++++++++++---
1 file changed, 59 insertions(+), 4 deletions(-)
diff --git a/Documentation/git-fast-export.txt b/Documentation/git-fast-export.txt
index 52831fa..dbe9a46 100644
--- a/Documentation/git-fast-export.txt
+++ b/Documentation/git-fast-export.txt
@@ -106,10 +106,9 @@ marks the same across runs.
different from the commit's first parent).
--anonymize::
- Replace all refnames, paths, blob contents, commit and tag
- messages, names, and email addresses in the output with
- anonymized data, while still retaining the shape of history and
- of the stored tree.
+ Anonymize the contents of the repository while still retaining
+ the shape of the history and stored tree. See the section on
+ `ANONYMIZING` below.
--refspec::
Apply the specified refspec to each ref exported. Multiple of them can
@@ -147,6 +146,62 @@ referenced by that revision range contains the string
'refs/heads/master'.
+ANONYMIZING
+-----------
+
+If the `--anonymize` option is given, git will attempt to remove all
+identifying information from the repository while still retaining enough
+of the original tree and history patterns to reproduce some bugs. The
+goal is that a git bug which is found on a private repository will
+persist in the anonymized repository, and the latter can be shared with
+git developers to help solve the bug.
+
+With this option, git will replace all refnames, paths, blob contents,
+commit and tag messages, names, and email addresses in the output with
+anonymized data. Two instances of the same string will be replaced
+equivalently (e.g., two commits with the same author will have the same
+anonymized author in the output, but bear no resemblance to the original
+author string). The relationship between commits, branches, and tags is
+retained, as well as the commit timestamps (but the commit messages and
+refnames bear no resemblance to the originals). The relative makeup of
+the tree is retained (e.g., if you have a root tree with 10 files and 3
+trees, so will the output), but their names and the contents of the
+files will be replaced.
+
+If you think you have found a git bug, you can start by exporting an
+anonymized stream of the whole repository:
+
+---------------------------------------------------
+$ git fast-export --anonymize --all >anon-stream
+---------------------------------------------------
+
+Then confirm that the bug persists in a repository created from that
+stream (many bugs will not, as they really do depend on the exact
+repository contents):
+
+---------------------------------------------------
+$ git init anon-repo
+$ cd anon-repo
+$ git fast-import <../anon-stream
+$ ... test your bug ...
+---------------------------------------------------
+
+If the anonymized repository shows the bug, it may be worth sharing
+`anon-stream` along with a regular bug report. Note that the anonymized
+stream compresses very well, so gzipping it is encouraged. If you want
+to examine the stream to see that it does not contain any private data,
+you can peruse it directly before sending. You may also want to try:
+
+---------------------------------------------------
+$ perl -pe 's/\d+/X/g' <anon-stream | sort -u | less
+---------------------------------------------------
+
+which shows all of the unique lines (with numbers converted to "X", to
+collapse "User 0", "User 1", etc into "User X"). This produces a much
+smaller output, and it is usually easy to quickly confirm that there is
+no private data in the stream.
+
+
Limitations
-----------
--
2.1.0.346.ga0367b9
next prev parent reply other threads:[~2014-08-28 12:33 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-08-21 7:01 [PATCH] teach fast-export an --anonymize option Jeff King
2014-08-21 20:15 ` Junio C Hamano
2014-08-21 22:41 ` Jeff King
2014-08-21 21:57 ` Junio C Hamano
2014-08-21 22:49 ` Jeff King
2014-08-21 23:21 ` [PATCH v2] " Jeff King
2014-08-22 13:06 ` Duy Nguyen
2014-08-22 18:39 ` Philip Oakley
2014-08-23 6:19 ` Jeff King
2014-08-27 16:01 ` Junio C Hamano
2014-08-27 16:58 ` Jeff King
2014-08-27 17:01 ` [PATCH v3] " Jeff King
2014-08-28 10:30 ` Duy Nguyen
2014-08-28 12:32 ` Jeff King [this message]
2014-08-28 16:46 ` Ramsay Jones
2014-08-28 18:43 ` Junio C Hamano
2014-08-28 18:50 ` Jeff King
2014-08-28 18:11 ` Junio C Hamano
2014-08-28 19:04 ` Jeff King
2014-08-31 10:34 ` Eric Sunshine
2014-08-31 15:53 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140828123257.GA18642@peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=pclouds@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).