git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
To: Jeff King <peff@peff.net>, Duy Nguyen <pclouds@gmail.com>
Cc: Junio C Hamano <gitster@pobox.com>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: [PATCH v3] teach fast-export an --anonymize option
Date: Thu, 28 Aug 2014 17:46:15 +0100	[thread overview]
Message-ID: <53FF5CD7.8040603@ramsay1.demon.co.uk> (raw)
In-Reply-To: <20140828123257.GA18642@peff.net>

On 28/08/14 13:32, Jeff King wrote:
> On Thu, Aug 28, 2014 at 05:30:44PM +0700, Duy Nguyen wrote:
> 
>> On Thu, Aug 28, 2014 at 12:01 AM, Jeff King <peff@peff.net> wrote:
>>> You can get an overview of what will be shared
>>> by running a command like:
>>>
>>>   git fast-export --anonymize --all |
>>>   perl -pe 's/\d+/X/g' |
>>>   sort -u |
>>>   less
>>>
>>> which will show every unique line we generate, modulo any
>>> numbers (each anonymized token is assigned a number, like
>>> "User 0", and we replace it consistently in the output).
>>
>> I feel like this should be part of git-fast-export.txt, just to
>> increase the user's confidence in the tool (and I don't expect most
>> users to read this commit message).
> 
> Hmph. Whenever I say "I think this patch is done", suddenly the comments
> start pouring in. :)

:-D

> I think you are right, though, and we could stand to explain
> the feature a little more in the documentation in general.
> How about this patch on top (or squashed in):
> 
> -- >8 --
> Subject: docs/fast-export: explain --anonymize more completely
> 
> The original commit made mention of this option, but not why
> one might want it or how they might use it. Let's try to be
> a little more thorough, and also explain how to confirm that
> the output really is anonymous.
> 
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  Documentation/git-fast-export.txt | 63 ++++++++++++++++++++++++++++++++++++---
>  1 file changed, 59 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/git-fast-export.txt b/Documentation/git-fast-export.txt
> index 52831fa..dbe9a46 100644
> --- a/Documentation/git-fast-export.txt
> +++ b/Documentation/git-fast-export.txt
> @@ -106,10 +106,9 @@ marks the same across runs.
>  	different from the commit's first parent).
>  
>  --anonymize::
> -	Replace all refnames, paths, blob contents, commit and tag
> -	messages, names, and email addresses in the output with
> -	anonymized data, while still retaining the shape of history and
> -	of the stored tree.
> +	Anonymize the contents of the repository while still retaining
> +	the shape of the history and stored tree.  See the section on
> +	`ANONYMIZING` below.
>  
>  --refspec::
>  	Apply the specified refspec to each ref exported. Multiple of them can
> @@ -147,6 +146,62 @@ referenced by that revision range contains the string
>  'refs/heads/master'.
>  
>  
> +ANONYMIZING
> +-----------
> +
> +If the `--anonymize` option is given, git will attempt to remove all
> +identifying information from the repository while still retaining enough
> +of the original tree and history patterns to reproduce some bugs. The
> +goal is that a git bug which is found on a private repository will

s/goal/hope/ ;-)

> +persist in the anonymized repository, and the latter can be shared with
> +git developers to help solve the bug.
> +
> +With this option, git will replace all refnames, paths, blob contents,
> +commit and tag messages, names, and email addresses in the output with
> +anonymized data.  Two instances of the same string will be replaced
> +equivalently (e.g., two commits with the same author will have the same
> +anonymized author in the output, but bear no resemblance to the original
> +author string). The relationship between commits, branches, and tags is
> +retained, as well as the commit timestamps (but the commit messages and
> +refnames bear no resemblance to the originals). The relative makeup of
> +the tree is retained (e.g., if you have a root tree with 10 files and 3
> +trees, so will the output), but their names and the contents of the
> +files will be replaced.
> +
> +If you think you have found a git bug, you can start by exporting an
> +anonymized stream of the whole repository:
> +
> +---------------------------------------------------
> +$ git fast-export --anonymize --all >anon-stream
> +---------------------------------------------------
> +
> +Then confirm that the bug persists in a repository created from that
> +stream (many bugs will not, as they really do depend on the exact
> +repository contents):

Dumb question (I have not even read the patch, so please just ignore me
if this is indeed dumb!): Is the map of <original-name, anonymized-name>
available to the user while he attempts to confirm that the bug is still
present?

For example, if I anonymized git.git, and did 'git branch -v' (say), how
easy would it be for me to recognise which branch was 'next'?

ATB,
Ramsay Jones

  reply	other threads:[~2014-08-28 16:46 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-21  7:01 [PATCH] teach fast-export an --anonymize option Jeff King
2014-08-21 20:15 ` Junio C Hamano
2014-08-21 22:41   ` Jeff King
2014-08-21 21:57 ` Junio C Hamano
2014-08-21 22:49   ` Jeff King
2014-08-21 23:21     ` [PATCH v2] " Jeff King
2014-08-22 13:06       ` Duy Nguyen
2014-08-22 18:39       ` Philip Oakley
2014-08-23  6:19         ` Jeff King
2014-08-27 16:01       ` Junio C Hamano
2014-08-27 16:58         ` Jeff King
2014-08-27 17:01           ` [PATCH v3] " Jeff King
2014-08-28 10:30             ` Duy Nguyen
2014-08-28 12:32               ` Jeff King
2014-08-28 16:46                 ` Ramsay Jones [this message]
2014-08-28 18:43                   ` Junio C Hamano
2014-08-28 18:50                   ` Jeff King
2014-08-28 18:11                 ` Junio C Hamano
2014-08-28 19:04                   ` Jeff King
2014-08-31 10:34                 ` Eric Sunshine
2014-08-31 15:53                   ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53FF5CD7.8040603@ramsay1.demon.co.uk \
    --to=ramsay@ramsay1.demon.co.uk \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).