From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Cc: Eric Sunshine <sunshine@sunshineco.com>,
Junio C Hamano <gitster@pobox.com>,
Johannes Schindelin <Johannes.Schindelin@gmx.de>
Subject: [PATCH v2 0/4] fast-export: allow dumping anonymization mappings
Date: Mon, 22 Jun 2020 17:47:45 -0400 [thread overview]
Message-ID: <20200622214745.GA3302779@coredump.intra.peff.net> (raw)
In-Reply-To: <20200619132304.GA2540657@coredump.intra.peff.net>
On Fri, Jun 19, 2020 at 09:23:04AM -0400, Jeff King wrote:
> This series gives an alternate way to achieve the same effect, but much
> better in that it works for _any_ ref (so if you are trying to reproduce
> the effect of "rev-list origin/foo..bar" in the anonymized repo, you can
> easily do so). Ditto for paths, so that "rev-list -- foo.c" can be
> reproduced in the anonymized repo.
Here's a v2 which I think addresses all of the comments. I have to admit
that after writing my last email to Junio, I am wondering whether it
would be sufficient and simpler to let the user specify a static mapping
of tokens (that could just be applied anywhere).
I'll take a look at that, but since I worked up this version, here it is
in the meantime.
The interesting changes are:
- path output is now quoted, making it unambiguous. The intent is for
humans to look at it, but it's not much extra work to make it
machine readable, too.
- the path dumping was in the wrong spot. It was happening in the
generic function that's used for "path-like" things, including
refnames. So the path mapping dump had extra cruft in it.
- got rid of the maybe_dump_anon() helper
- tests now avoid hard-coding expected counts
- the path-dump test now checks the expected count
[1/4]: fast-export: allow dumping the refname mapping
[2/4]: fast-export: anonymize "master" refname
[3/4]: fast-export: refactor path printing to not rely on stdout
[4/4]: fast-export: allow dumping the path mapping
Documentation/git-fast-export.txt | 34 +++++++++++++++
builtin/fast-export.c | 69 +++++++++++++++++++++++++------
t/t9351-fast-export-anonymize.sh | 44 ++++++++++++++++----
3 files changed, 125 insertions(+), 22 deletions(-)
Range-diff from v1:
1: 82a17ae976 ! 1: 7ba5582d66 fast-export: allow dumping the refname mapping
@@ builtin/fast-export.c: static int has_unshown_parent(struct commit *commit)
+ kh_put_strset(seen->set, xstrdup(str), &hashret);
+ return 0;
+}
-+
-+static void maybe_dump_anon(FILE *out, struct seen_set *seen,
-+ const char *orig, const char *anon)
-+{
-+ if (!out)
-+ return;
-+ if (!check_and_mark_seen(seen, orig))
-+ fprintf(out, "%s %s\n", orig, anon);
-+}
+
struct anonymized_entry {
struct hashmap_entry hash;
@@ builtin/fast-export.c: static const char *anonymize_refname(const char *refname)
}
anonymize_path(&anon, refname, &refs, anonymize_ref_component);
-+ maybe_dump_anon(anonymized_refnames_handle, &seen,
++
++ if (anonymized_refnames_handle &&
++ !check_and_mark_seen(&seen, full_refname))
++ fprintf(anonymized_refnames_handle, "%s %s\n",
+ full_refname, anon.buf);
++
return anon.buf;
}
@@ t/t9351-fast-export-anonymize.sh: test_expect_success 'stream omits tag message'
+ # we make no guarantees of the exact anonymized names,
+ # so just check that we have the right number and
+ # that a sample line looks sane.
++ expected_count=$(git for-each-ref | wc -l) &&
+ # Note that master is not anonymized, and so not included
+ # in the mapping.
-+ test_line_count = 6 refs.out &&
++ expected_count=$((expected_count - 1)) &&
++ test_line_count = $expected_count refs.out &&
+ grep "^refs/heads/other refs/heads/" refs.out
+'
+
2: be56b375cc ! 2: d88f7c83a5 fast-export: anonymize "master" refname
@@ t/t9351-fast-export-anonymize.sh: test_expect_success 'stream omits path names'
! grep mytag stream
'
@@ t/t9351-fast-export-anonymize.sh: test_expect_success 'refname mapping can be dumped' '
- # we make no guarantees of the exact anonymized names,
# so just check that we have the right number and
# that a sample line looks sane.
+ expected_count=$(git for-each-ref | wc -l) &&
- # Note that master is not anonymized, and so not included
- # in the mapping.
-- test_line_count = 6 refs.out &&
-+ test_line_count = 7 refs.out &&
+- expected_count=$((expected_count - 1)) &&
+ test_line_count = $expected_count refs.out &&
grep "^refs/heads/other refs/heads/" refs.out
'
-
@@ t/t9351-fast-export-anonymize.sh: test_expect_success 'import stream to new repository' '
test_expect_success 'result has two branches' '
git for-each-ref --format="%(refname)" refs/heads >branches &&
-: ---------- > 3: 164f1e1eab fast-export: refactor path printing to not rely on stdout
3: a4e9f1f2ac ! 4: b0aa59f07e fast-export: allow dumping the path mapping
@@ Commit message
We recently taught fast-export to dump the refname mapping. Let's do the
same thing for paths, which can reuse most of the same infrastructure.
- Note that the output format isn't unambiguous here (because paths could
- contain spaces). That's OK because this is meant to be examined by a
- human.
We could also just introduce a "dump mapping" file that shows every
mapping we make. But it would be a bit more awkward to work with, as the
@@ Documentation/git-fast-export.txt: by keeping the marks the same across runs.
+ Output the mapping of real paths to anonymized paths to <file>.
+ The output will contain one line per path that appears in the
+ output stream, with the original path, a space, and its
-+ anonymized counterpart. See the section on `ANONYMIZING` below.
++ anonymized counterpart. Paths may be quoted if they contain a
++ space, or unusual characters; see `core.quotePath` in
++ linkgit:git-config(1). See also `ANONYMIZING` below.
+
--reference-excluded-parents::
By default, running a command such as `git fast-export
@@ builtin/fast-export.c: static struct string_list tag_refs = STRING_LIST_INIT_NOD
static struct revision_sources revision_sources;
static int parse_opt_signed_tag_mode(const struct option *opt,
-@@ builtin/fast-export.c: static void anonymize_path(struct strbuf *out, const char *path,
- struct hashmap *map,
- void *(*generate)(const void *, size_t *))
- {
-+ static struct seen_set seen;
-+ const char *full_path = path;
+@@ builtin/fast-export.c: static void print_path(const char *path)
+ print_path_1(stdout, path);
+ else {
+ static struct hashmap paths;
++ static struct seen_set seen;
+ static struct strbuf anon = STRBUF_INIT;
+
+ anonymize_path(&anon, path, &paths, anonymize_path_component);
++ if (anonymized_paths_handle &&
++ !check_and_mark_seen(&seen, path)) {
++ print_path_1(anonymized_paths_handle, path);
++ fputc(' ', anonymized_paths_handle);
++ print_path_1(anonymized_paths_handle, anon.buf);
++ fputc('\n', anonymized_paths_handle);
++ }
+
- while (*path) {
- const char *end_of_component = strchrnul(path, '/');
- size_t len = end_of_component - path;
-@@ builtin/fast-export.c: static void anonymize_path(struct strbuf *out, const char *path,
- if (*path)
- strbuf_addch(out, *path++);
+ print_path_1(stdout, anon.buf);
+ strbuf_reset(&anon);
}
-+
-+ maybe_dump_anon(anonymized_paths_handle, &seen, full_path, out->buf);
- }
-
- static inline void *mark_to_ptr(uint32_t mark)
@@ builtin/fast-export.c: int cmd_fast_export(int argc, const char **argv, const char *prefix)
*import_filename = NULL,
*import_filename_if_exists = NULL;
@@ builtin/fast-export.c: int cmd_fast_export(int argc, const char **argv, const ch
printf("feature done\n");
## t/t9351-fast-export-anonymize.sh ##
+@@ t/t9351-fast-export-anonymize.sh: test_expect_success 'setup simple repo' '
+ git checkout -b other HEAD^ &&
+ mkdir subdir &&
+ test_commit subdir/bar &&
+- test_commit subdir/xyzzy &&
++ test_commit quoting "subdir/this needs quoting" &&
+ git tag -m "annotated tag" mytag
+ '
+
+@@ t/t9351-fast-export-anonymize.sh: test_expect_success 'stream omits path names' '
+ ! grep foo stream &&
+ ! grep subdir stream &&
+ ! grep bar stream &&
+- ! grep xyzzy stream
++ ! grep quoting stream
+ '
+
+ test_expect_success 'stream omits refnames' '
@@ t/t9351-fast-export-anonymize.sh: test_expect_success 'refname mapping can be dumped' '
grep "^refs/heads/other refs/heads/" refs.out
'
+test_expect_success 'path mapping can be dumped' '
+ git fast-export --anonymize --all \
+ --dump-anonymized-paths=paths.out >/dev/null &&
-+ # do not assume a particular anonymization scheme or order;
-+ # just sanity check that a sample line looks sensible.
-+ grep "^foo " paths.out
++ # as above, avoid depending on the exact scheme, but
++ # but check that we have the right number of mappings,
++ # and spot-check one sample.
++ expected_count=$(
++ git rev-list --objects --all |
++ git cat-file --batch-check="%(objecttype) %(rest)" |
++ sed -ne "s/^blob //p" |
++ sort -u |
++ wc -l
++ ) &&
++ test_line_count = $expected_count paths.out &&
++ grep "^\"subdir/this needs quoting\" " paths.out
+'
+
# NOTE: we chdir to the new, anonymized repository
next prev parent reply other threads:[~2020-06-22 21:47 UTC|newest]
Thread overview: 64+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-06-19 13:23 [PATCH 0/3] fast-export: allow dumping anonymization mappings Jeff King
2020-06-19 13:25 ` [PATCH 1/3] fast-export: allow dumping the refname mapping Jeff King
2020-06-19 15:51 ` Eric Sunshine
2020-06-19 16:01 ` Jeff King
2020-06-19 16:18 ` Eric Sunshine
2020-06-19 17:45 ` Jeff King
2020-06-19 18:00 ` Eric Sunshine
2020-06-22 21:30 ` Jeff King
2020-06-19 19:20 ` Junio C Hamano
2020-06-22 21:32 ` Jeff King
2020-06-19 13:26 ` [PATCH 2/3] fast-export: anonymize "master" refname Jeff King
2020-06-19 13:29 ` [PATCH 3/3] fast-export: allow dumping the path mapping Jeff King
2020-06-19 16:00 ` Eric Sunshine
2020-06-19 19:24 ` Junio C Hamano
2020-06-22 21:38 ` Jeff King
2020-06-19 13:51 ` [PATCH 0/3] fast-export: allow dumping anonymization mappings Johannes Schindelin
2020-06-22 16:35 ` Junio C Hamano
2020-06-22 21:47 ` Jeff King [this message]
2020-06-22 21:47 ` [PATCH v2 1/4] fast-export: allow dumping the refname mapping Jeff King
2020-06-22 21:48 ` [PATCH v2 2/4] fast-export: anonymize "master" refname Jeff King
2020-06-22 21:48 ` [PATCH v2 3/4] fast-export: refactor path printing to not rely on stdout Jeff King
2020-06-22 21:48 ` [PATCH v2 4/4] fast-export: allow dumping the path mapping Jeff King
2020-06-23 15:24 ` [alternative 0/10] fast-export: allow seeding the anonymized mapping Jeff King
2020-06-23 15:24 ` [PATCH 01/10] t9351: derive anonymized tree checks from original repo Jeff King
2020-06-23 15:24 ` [PATCH 02/10] fast-export: use xmemdupz() for anonymizing oids Jeff King
2020-06-23 15:24 ` [PATCH 03/10] fast-export: store anonymized oids as hex strings Jeff King
2020-06-24 11:43 ` SZEDER Gábor
2020-06-24 15:54 ` Jeff King
2020-06-25 15:49 ` Jeff King
2020-06-25 20:45 ` SZEDER Gábor
2020-06-25 21:15 ` Jeff King
2020-06-29 13:17 ` Johannes Schindelin
2020-06-30 19:35 ` Jeff King
2020-06-23 15:24 ` [PATCH 04/10] fast-export: tighten anonymize_mem() interface to handle only strings Jeff King
2020-06-23 15:24 ` [PATCH 05/10] fast-export: stop storing lengths in anonymized hashmaps Jeff King
2020-06-23 15:24 ` [PATCH 06/10] fast-export: use a flex array to store anonymized entries Jeff King
2020-06-23 15:25 ` [PATCH 07/10] fast-export: move global "idents" anonymize hashmap into function Jeff King
2020-06-23 15:25 ` [PATCH 08/10] fast-export: add a "data" callback parameter to anonymize_str() Jeff King
2020-06-24 19:58 ` Junio C Hamano
2020-06-23 15:25 ` [PATCH 09/10] fast-export: allow seeding the anonymized mapping Jeff King
2020-06-23 17:16 ` Eric Sunshine
2020-06-23 18:30 ` Jeff King
2020-06-23 20:30 ` Eric Sunshine
2020-06-24 15:47 ` Jeff King
2020-06-23 18:11 ` Eric Sunshine
2020-06-23 18:35 ` Jeff King
2020-06-23 20:35 ` Eric Sunshine
2020-06-24 15:48 ` Jeff King
2020-06-23 15:25 ` [PATCH 10/10] fast-export: anonymize "master" refname Jeff King
2020-06-23 19:34 ` [alternative 0/10] fast-export: allow seeding the anonymized mapping Junio C Hamano
2020-06-23 19:44 ` Jeff King
2020-06-25 19:48 ` [PATCH v2 0/11] " Jeff King
2020-06-25 19:48 ` [PATCH v2 01/11] t9351: derive anonymized tree checks from original repo Jeff King
2020-06-25 19:48 ` [PATCH v2 02/11] fast-export: use xmemdupz() for anonymizing oids Jeff King
2020-06-25 19:48 ` [PATCH v2 03/11] fast-export: store anonymized oids as hex strings Jeff King
2020-06-25 19:48 ` [PATCH v2 04/11] fast-export: tighten anonymize_mem() interface to handle only strings Jeff King
2020-06-25 19:48 ` [PATCH v2 05/11] fast-export: stop storing lengths in anonymized hashmaps Jeff King
2020-06-25 19:48 ` [PATCH v2 06/11] fast-export: use a flex array to store anonymized entries Jeff King
2020-06-25 19:48 ` [PATCH v2 07/11] fast-export: move global "idents" anonymize hashmap into function Jeff King
2020-06-25 19:48 ` [PATCH v2 08/11] fast-export: add a "data" callback parameter to anonymize_str() Jeff King
2020-06-25 19:48 ` [PATCH v2 09/11] fast-export: allow seeding the anonymized mapping Jeff King
2020-06-25 19:48 ` [PATCH v2 10/11] fast-export: anonymize "master" refname Jeff King
2020-06-25 19:48 ` [PATCH v2 11/11] fast-export: use local array to store anonymized oid Jeff King
2020-06-25 21:22 ` [PATCH v2 0/11] fast-export: allow seeding the anonymized mapping Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200622214745.GA3302779@coredump.intra.peff.net \
--to=peff@peff.net \
--cc=Johannes.Schindelin@gmx.de \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=sunshine@sunshineco.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).