git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Cc: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Subject: [PATCH 3/3] fast-export: allow dumping the path mapping
Date: Fri, 19 Jun 2020 09:29:23 -0400	[thread overview]
Message-ID: <20200619132923.GA2540897@coredump.intra.peff.net> (raw)
In-Reply-To: <20200619132304.GA2540657@coredump.intra.peff.net>

When working with an anonymized repo, it can be useful to be able to
refer to particular paths. E.g., reproducing a bug with "git rev-list --
foo.c" in the original repo would need to replace "foo.c" with its
anonymized counterpart to produce the same effect.

We recently taught fast-export to dump the refname mapping. Let's do the
same thing for paths, which can reuse most of the same infrastructure.
Note that the output format isn't unambiguous here (because paths could
contain spaces). That's OK because this is meant to be examined by a
human.

We could also just introduce a "dump mapping" file that shows every
mapping we make. But it would be a bit more awkward to work with, as the
user would have to sort through more data to find the parts they're
interested in (and there are likely to be many more paths than refnames,
making it annoying for people who just want to dump the refnames).

Signed-off-by: Jeff King <peff@peff.net>
---
 Documentation/git-fast-export.txt | 10 ++++++++++
 builtin/fast-export.c             | 12 ++++++++++++
 t/t9351-fast-export-anonymize.sh  |  8 ++++++++
 3 files changed, 30 insertions(+)

diff --git a/Documentation/git-fast-export.txt b/Documentation/git-fast-export.txt
index e809bb3f18..c63f109f1d 100644
--- a/Documentation/git-fast-export.txt
+++ b/Documentation/git-fast-export.txt
@@ -125,6 +125,12 @@ by keeping the marks the same across runs.
 	the output stream, with the original refname, a space, and its
 	anonymized counterpart. See the section on `ANONYMIZING` below.
 
+--dump-anonymized-paths=<file>::
+	Output the mapping of real paths to anonymized paths to <file>.
+	The output will contain one line per path that appears in the
+	output stream, with the original path, a space, and its
+	anonymized counterpart. See the section on `ANONYMIZING` below.
+
 --reference-excluded-parents::
 	By default, running a command such as `git fast-export
 	master~5..master` will not include the commit master{tilde}5
@@ -261,6 +267,10 @@ refs/tags/v2.0 refs/tags/ref50
 which tells you that `git rev-list ref31..ref50` may produce the same
 bug in the re-imported anonymous repository.
 
+Likewise, `--dump-anonymized-paths` may be useful for a bug that
+involves pathspecs. E.g., `git rev-list v1.0..v2.0 -- foo.c` requires
+knowing the path corresponding to `foo.c` in the result.
+
 LIMITATIONS
 -----------
 
diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index cd0174d514..ed1f8daa7f 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -47,6 +47,7 @@ static struct string_list tag_refs = STRING_LIST_INIT_NODUP;
 static struct refspec refspecs = REFSPEC_INIT_FETCH;
 static int anonymize;
 static FILE *anonymized_refnames_handle;
+static FILE *anonymized_paths_handle;
 static struct revision_sources revision_sources;
 
 static int parse_opt_signed_tag_mode(const struct option *opt,
@@ -211,6 +212,9 @@ static void anonymize_path(struct strbuf *out, const char *path,
 			   struct hashmap *map,
 			   void *(*generate)(const void *, size_t *))
 {
+	static struct seen_set seen;
+	const char *full_path = path;
+
 	while (*path) {
 		const char *end_of_component = strchrnul(path, '/');
 		size_t len = end_of_component - path;
@@ -220,6 +224,8 @@ static void anonymize_path(struct strbuf *out, const char *path,
 		if (*path)
 			strbuf_addch(out, *path++);
 	}
+
+	maybe_dump_anon(anonymized_paths_handle, &seen, full_path, out->buf);
 }
 
 static inline void *mark_to_ptr(uint32_t mark)
@@ -1170,6 +1176,7 @@ int cmd_fast_export(int argc, const char **argv, const char *prefix)
 	     *import_filename = NULL,
 	     *import_filename_if_exists = NULL;
 	const char *anonymized_refnames_file = NULL;
+	const char *anonymized_paths_file = NULL;
 	uint32_t lastimportid;
 	struct string_list refspecs_list = STRING_LIST_INIT_NODUP;
 	struct string_list paths_of_changed_objects = STRING_LIST_INIT_DUP;
@@ -1206,6 +1213,9 @@ int cmd_fast_export(int argc, const char **argv, const char *prefix)
 		OPT_STRING(0, "dump-anonymized-refnames",
 			   &anonymized_refnames_file, N_("file"),
 			   N_("output anonymized refname mapping to <file>")),
+		OPT_STRING(0, "dump-anonymized-paths",
+			   &anonymized_paths_file, N_("file"),
+			   N_("output anonymized path mapping to <file>")),
 		OPT_BOOL(0, "reference-excluded-parents",
 			 &reference_excluded_commits, N_("Reference parents which are not in fast-export stream by object id")),
 		OPT_BOOL(0, "show-original-ids", &show_original_ids,
@@ -1244,6 +1254,8 @@ int cmd_fast_export(int argc, const char **argv, const char *prefix)
 
 	if (anonymized_refnames_file)
 		anonymized_refnames_handle = xfopen(anonymized_refnames_file, "w");
+	if (anonymized_paths_file)
+		anonymized_paths_handle = xfopen(anonymized_paths_file, "w");
 
 	if (use_done_feature)
 		printf("feature done\n");
diff --git a/t/t9351-fast-export-anonymize.sh b/t/t9351-fast-export-anonymize.sh
index 88847b0f60..3607b9b972 100755
--- a/t/t9351-fast-export-anonymize.sh
+++ b/t/t9351-fast-export-anonymize.sh
@@ -53,6 +53,14 @@ test_expect_success 'refname mapping can be dumped' '
 	grep "^refs/heads/other refs/heads/" refs.out
 '
 
+test_expect_success 'path mapping can be dumped' '
+	git fast-export --anonymize --all \
+		--dump-anonymized-paths=paths.out >/dev/null &&
+	# do not assume a particular anonymization scheme or order;
+	# just sanity check that a sample line looks sensible.
+	grep "^foo " paths.out
+'
+
 # NOTE: we chdir to the new, anonymized repository
 # after this. All further tests should assume this.
 test_expect_success 'import stream to new repository' '
-- 
2.27.0.480.g4f98dbcb10

  parent reply	other threads:[~2020-06-19 13:29 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-19 13:23 [PATCH 0/3] fast-export: allow dumping anonymization mappings Jeff King
2020-06-19 13:25 ` [PATCH 1/3] fast-export: allow dumping the refname mapping Jeff King
2020-06-19 15:51   ` Eric Sunshine
2020-06-19 16:01     ` Jeff King
2020-06-19 16:18       ` Eric Sunshine
2020-06-19 17:45         ` Jeff King
2020-06-19 18:00           ` Eric Sunshine
2020-06-22 21:30             ` Jeff King
2020-06-19 19:20         ` Junio C Hamano
2020-06-22 21:32           ` Jeff King
2020-06-19 13:26 ` [PATCH 2/3] fast-export: anonymize "master" refname Jeff King
2020-06-19 13:29 ` Jeff King [this message]
2020-06-19 16:00   ` [PATCH 3/3] fast-export: allow dumping the path mapping Eric Sunshine
2020-06-19 19:24   ` Junio C Hamano
2020-06-22 21:38     ` Jeff King
2020-06-19 13:51 ` [PATCH 0/3] fast-export: allow dumping anonymization mappings Johannes Schindelin
2020-06-22 16:35   ` Junio C Hamano
2020-06-22 21:47 ` [PATCH v2 0/4] " Jeff King
2020-06-22 21:47   ` [PATCH v2 1/4] fast-export: allow dumping the refname mapping Jeff King
2020-06-22 21:48   ` [PATCH v2 2/4] fast-export: anonymize "master" refname Jeff King
2020-06-22 21:48   ` [PATCH v2 3/4] fast-export: refactor path printing to not rely on stdout Jeff King
2020-06-22 21:48   ` [PATCH v2 4/4] fast-export: allow dumping the path mapping Jeff King
2020-06-23 15:24   ` [alternative 0/10] fast-export: allow seeding the anonymized mapping Jeff King
2020-06-23 15:24     ` [PATCH 01/10] t9351: derive anonymized tree checks from original repo Jeff King
2020-06-23 15:24     ` [PATCH 02/10] fast-export: use xmemdupz() for anonymizing oids Jeff King
2020-06-23 15:24     ` [PATCH 03/10] fast-export: store anonymized oids as hex strings Jeff King
2020-06-24 11:43       ` SZEDER Gábor
2020-06-24 15:54         ` Jeff King
2020-06-25 15:49           ` Jeff King
2020-06-25 20:45             ` SZEDER Gábor
2020-06-25 21:15               ` Jeff King
2020-06-29 13:17                 ` Johannes Schindelin
2020-06-30 19:35                   ` Jeff King
2020-06-23 15:24     ` [PATCH 04/10] fast-export: tighten anonymize_mem() interface to handle only strings Jeff King
2020-06-23 15:24     ` [PATCH 05/10] fast-export: stop storing lengths in anonymized hashmaps Jeff King
2020-06-23 15:24     ` [PATCH 06/10] fast-export: use a flex array to store anonymized entries Jeff King
2020-06-23 15:25     ` [PATCH 07/10] fast-export: move global "idents" anonymize hashmap into function Jeff King
2020-06-23 15:25     ` [PATCH 08/10] fast-export: add a "data" callback parameter to anonymize_str() Jeff King
2020-06-24 19:58       ` Junio C Hamano
2020-06-23 15:25     ` [PATCH 09/10] fast-export: allow seeding the anonymized mapping Jeff King
2020-06-23 17:16       ` Eric Sunshine
2020-06-23 18:30         ` Jeff King
2020-06-23 20:30           ` Eric Sunshine
2020-06-24 15:47             ` Jeff King
2020-06-23 18:11       ` Eric Sunshine
2020-06-23 18:35         ` Jeff King
2020-06-23 20:35           ` Eric Sunshine
2020-06-24 15:48             ` Jeff King
2020-06-23 15:25     ` [PATCH 10/10] fast-export: anonymize "master" refname Jeff King
2020-06-23 19:34     ` [alternative 0/10] fast-export: allow seeding the anonymized mapping Junio C Hamano
2020-06-23 19:44       ` Jeff King
2020-06-25 19:48     ` [PATCH v2 0/11] " Jeff King
2020-06-25 19:48       ` [PATCH v2 01/11] t9351: derive anonymized tree checks from original repo Jeff King
2020-06-25 19:48       ` [PATCH v2 02/11] fast-export: use xmemdupz() for anonymizing oids Jeff King
2020-06-25 19:48       ` [PATCH v2 03/11] fast-export: store anonymized oids as hex strings Jeff King
2020-06-25 19:48       ` [PATCH v2 04/11] fast-export: tighten anonymize_mem() interface to handle only strings Jeff King
2020-06-25 19:48       ` [PATCH v2 05/11] fast-export: stop storing lengths in anonymized hashmaps Jeff King
2020-06-25 19:48       ` [PATCH v2 06/11] fast-export: use a flex array to store anonymized entries Jeff King
2020-06-25 19:48       ` [PATCH v2 07/11] fast-export: move global "idents" anonymize hashmap into function Jeff King
2020-06-25 19:48       ` [PATCH v2 08/11] fast-export: add a "data" callback parameter to anonymize_str() Jeff King
2020-06-25 19:48       ` [PATCH v2 09/11] fast-export: allow seeding the anonymized mapping Jeff King
2020-06-25 19:48       ` [PATCH v2 10/11] fast-export: anonymize "master" refname Jeff King
2020-06-25 19:48       ` [PATCH v2 11/11] fast-export: use local array to store anonymized oid Jeff King
2020-06-25 21:22       ` [PATCH v2 0/11] fast-export: allow seeding the anonymized mapping Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200619132923.GA2540897@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).