git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, Duy Nguyen <pclouds@gmail.com>
Subject: Re: [PATCH v2] teach fast-export an --anonymize option
Date: Wed, 27 Aug 2014 09:01:02 -0700	[thread overview]
Message-ID: <xmqq8um9gbwh.fsf@gitster.dls.corp.google.com> (raw)
In-Reply-To: <20140821232100.GA27849@peff.net> (Jeff King's message of "Thu, 21 Aug 2014 19:21:00 -0400")

Jeff King <peff@peff.net> writes:

> diff --git a/t/t9351-fast-export-anonymize.sh b/t/t9351-fast-export-anonymize.sh
> new file mode 100755
> index 0000000..f76ffe4
> --- /dev/null
> +++ b/t/t9351-fast-export-anonymize.sh
> @@ -0,0 +1,117 @@
> +#!/bin/sh
> +
> +test_description='basic tests for fast-export --anonymize'
> +. ./test-lib.sh
> +
> +test_expect_success 'setup simple repo' '
> +	test_commit base &&
> +	test_commit foo &&
> +	git checkout -b other HEAD^ &&
> +	mkdir subdir &&
> +	test_commit subdir/bar &&
> +	test_commit subdir/xyzzy &&
> +	git tag -m "annotated tag" mytag
> +'
> +
> +test_expect_success 'export anonymized stream' '
> +	git fast-export --anonymize --all >stream
> +'
> +
> +# this also covers commit messages
> +test_expect_success 'stream omits path names' '
> +	! fgrep base stream &&
> +	! fgrep foo stream &&
> +	! fgrep subdir stream &&
> +	! fgrep bar stream &&
> +	! fgrep xyzzy stream
> +'

I know there are a few isolated places that already use "fgrep", but
let's not spread the disease. Neither "fgrep" nor "egrep" appears in
POSIX and they can easily be spelled more portably as "grep -F" and
"grep -E", respectively.

> +test_expect_success 'stream allows master as refname' '
> +	fgrep master stream
> +'
> +
> +test_expect_success 'stream omits other refnames' '
> +	! fgrep other stream
> +'

What should happen to mytag?

> +
> +test_expect_success 'stream omits identities' '
> +	! fgrep "$GIT_COMMITTER_NAME" stream &&
> +	! fgrep "$GIT_COMMITTER_EMAIL" stream &&
> +	! fgrep "$GIT_AUTHOR_NAME" stream &&
> +	! fgrep "$GIT_AUTHOR_EMAIL" stream
> +'
> +
> +test_expect_success 'stream omits tag message' '
> +	! fgrep "annotated tag" stream
> +'
> +
> +# NOTE: we chdir to the new, anonymized repository
> +# after this. All further tests should assume this.
> +test_expect_success 'import stream to new repository' '
> +	git init new &&
> +	cd new &&
> +	git fast-import <../stream
> +'
> +
> +test_expect_success 'result has two branches' '
> +	git for-each-ref --format="%(refname)" refs/heads >branches &&
> +	test_line_count = 2 branches &&
> +	other_branch=$(grep -v refs/heads/master branches)
> +'
> +
> +test_expect_success 'repo has original shape' '
> +	cat >expect <<-\EOF &&
> +	> subject 3
> +	> subject 2
> +	< subject 1
> +	- subject 0
> +	EOF
> +	git log --format="%m %s" --left-right --boundary \
> +		master...$other_branch >actual &&
> +	test_cmp expect actual
> +'

Yuck and Hmph.  Doing a shape-preserving conversion is very
important, but I wonder if we can we verify without having to cast a
particular rewrite rule in stone.  We know we want to preserve
relative order of committer timestamps (to reproduce bugs that
depend on the traversal order), and it may be OK to reuse the
exactly the same committer timestamps from the original, in which
case we can make sure that we create the original history with
appropriate "test_tick"s (I think test_commit does that for us) and
use "%ct" instead of "%s" here, perhaps?  That way we can later
change the rewrite rules of commit object payload without having to
adjust this test.

> +
> +test_expect_success 'root tree has original shape' '
> +	cat >expect <<-\EOF &&
> +	blob
> +	tree
> +	EOF
> +	git ls-tree $other_branch >root &&
> +	cut -d" " -f2 <root >actual &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_success 'paths in subdir ended up in one tree' '
> +	cat >expect <<-\EOF &&
> +	blob
> +	blob
> +	EOF
> +	tree=$(grep tree root | cut -f2) &&
> +	git ls-tree $other_branch:$tree >tree &&
> +	cut -d" " -f2 <tree >actual &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_success 'tag points to branch tip' '
> +	git rev-parse $other_branch >expect &&
> +	git for-each-ref --format="%(*objectname)" | grep . >actual &&
> +	test_cmp expect actual
> +'

I notice you haven't checked how many tags you have in the
repository, unlike the number of branches which you counted
earlier.

> +test_expect_success 'idents are shared' '
> +	git log --all --format="%an <%ae>" >authors &&
> +	sort -u authors >unique &&
> +	test_line_count = 1 unique &&
> +	git log --all --format="%cn <%ce>" >committers &&
> +	sort -u committers >unique &&
> +	test_line_count = 1 unique &&
> +	! test_cmp authors committers
> +'

Two commits by the same author must convert to two commits by the
same anonymized author, but that is not tested here; the history
made in 'setup a simple repo' step is a bit too simple to do that
anyway, though ;-).

> +test_expect_success 'commit timestamps are retained' '
> +	git log --all --format="%ct" >timestamps &&
> +	sort -u timestamps >unique &&
> +	test_line_count = 4 unique
> +'
> +
> +test_done

  parent reply	other threads:[~2014-08-27 16:01 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-21  7:01 [PATCH] teach fast-export an --anonymize option Jeff King
2014-08-21 20:15 ` Junio C Hamano
2014-08-21 22:41   ` Jeff King
2014-08-21 21:57 ` Junio C Hamano
2014-08-21 22:49   ` Jeff King
2014-08-21 23:21     ` [PATCH v2] " Jeff King
2014-08-22 13:06       ` Duy Nguyen
2014-08-22 18:39       ` Philip Oakley
2014-08-23  6:19         ` Jeff King
2014-08-27 16:01       ` Junio C Hamano [this message]
2014-08-27 16:58         ` Jeff King
2014-08-27 17:01           ` [PATCH v3] " Jeff King
2014-08-28 10:30             ` Duy Nguyen
2014-08-28 12:32               ` Jeff King
2014-08-28 16:46                 ` Ramsay Jones
2014-08-28 18:43                   ` Junio C Hamano
2014-08-28 18:50                   ` Jeff King
2014-08-28 18:11                 ` Junio C Hamano
2014-08-28 19:04                   ` Jeff King
2014-08-31 10:34                 ` Eric Sunshine
2014-08-31 15:53                   ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqq8um9gbwh.fsf@gitster.dls.corp.google.com \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).