git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/6] minor cleanups to fast-export --anonymize
@ 2023-03-22 17:36 Jeff King
  2023-03-22 17:37 ` [PATCH 1/6] fast-export: drop const when storing anonymized values Jeff King
                   ` (6 more replies)
  0 siblings, 7 replies; 9+ messages in thread
From: Jeff King @ 2023-03-22 17:36 UTC (permalink / raw)
  To: git

My goal here was cleaning up some -Wunused-parameter warnings, but doing
so required a few preparatory cleanups, one of which actually fixes a
really minor bug (in patch 4).

  [1/6]: fast-export: drop const when storing anonymized values
  [2/6]: fast-export: simplify initialization of anonymized hashmaps
  [3/6]: fast-export: factor out anonymized_entry creation
  [4/6]: fast-export: de-obfuscate --anonymize-map handling
  [5/6]: fast-export: drop data parameter from anonymous generators
  [6/6]: fast-export: drop unused parameter from anonymize_commit_message()

 builtin/fast-export.c            | 81 ++++++++++++++++++--------------
 t/t9351-fast-export-anonymize.sh |  2 +
 2 files changed, 47 insertions(+), 36 deletions(-)

-Peff

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/6] fast-export: drop const when storing anonymized values
  2023-03-22 17:36 [PATCH 0/6] minor cleanups to fast-export --anonymize Jeff King
@ 2023-03-22 17:37 ` Jeff King
  2023-03-22 17:38 ` [PATCH 2/6] fast-export: simplify initialization of anonymized hashmaps Jeff King
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Jeff King @ 2023-03-22 17:37 UTC (permalink / raw)
  To: git

We store anonymized values as pointers to "const char *", since they are
conceptually const to callers who use them. But they are actually
allocated strings whose memory is owned by the struct.

The ownership mismatch hasn't been a big deal since we never free() them
(they are held until the program ends), but let's switch them to "char *"
in preparation for changing that.

Since most code only accesses them via anonymize_str(), it can continue
to narrow them to "const char *" in its return value.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/fast-export.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 78493c6d2bf..f422819c82a 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -110,7 +110,7 @@ static struct decoration idnums;
 static uint32_t last_idnum;
 struct anonymized_entry {
 	struct hashmap_entry hash;
-	const char *anon;
+	char *anon;
 	const char orig[FLEX_ARRAY];
 };
 
-- 
2.40.0.595.g9b96b494d8c


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/6] fast-export: simplify initialization of anonymized hashmaps
  2023-03-22 17:36 [PATCH 0/6] minor cleanups to fast-export --anonymize Jeff King
  2023-03-22 17:37 ` [PATCH 1/6] fast-export: drop const when storing anonymized values Jeff King
@ 2023-03-22 17:38 ` Jeff King
  2023-03-22 17:40 ` [PATCH 3/6] fast-export: factor out anonymized_entry creation Jeff King
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Jeff King @ 2023-03-22 17:38 UTC (permalink / raw)
  To: git

We take pains to avoid doing a lookup on a hashmap which has not been
initialized with hashmap_init(). That was necessary back when this code
was written. But hashmap_get() became safer in b7879b0ba6e (hashmap:
allow re-use after hashmap_free(), 2020-11-02). Since then it's OK to
call functions on a zero-initialized table; it will just correctly
return NULL, since there is no match.

This simplifies the code a little, and also lets us keep the
initialization line closer to when we add an entry (which is when the
hashmap really does need to be totally initialized). That will help
later refactoring.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/fast-export.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index f422819c82a..ba9ab3a97e5 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -152,25 +152,22 @@ static const char *anonymize_str(struct hashmap *map,
 	struct anonymized_entry_key key;
 	struct anonymized_entry *ret;
 
-	if (!map->cmpfn)
-		hashmap_init(map, anonymized_entry_cmp, NULL, 0);
-
 	hashmap_entry_init(&key.hash, memhash(orig, len));
 	key.orig = orig;
 	key.orig_len = len;
 
 	/* First check if it's a token the user configured manually... */
-	if (anonymized_seeds.cmpfn)
-		ret = hashmap_get_entry(&anonymized_seeds, &key, hash, &key);
-	else
-		ret = NULL;
+	ret = hashmap_get_entry(&anonymized_seeds, &key, hash, &key);
 
 	/* ...otherwise check if we've already seen it in this context... */
 	if (!ret)
 		ret = hashmap_get_entry(map, &key, hash, &key);
 
 	/* ...and finally generate a new mapping if necessary */
 	if (!ret) {
+		if (!map->cmpfn)
+			hashmap_init(map, anonymized_entry_cmp, NULL, 0);
+
 		FLEX_ALLOC_MEM(ret, orig, orig, len);
 		hashmap_entry_init(&ret->hash, key.hash.hash);
 		ret->anon = generate(data);
-- 
2.40.0.595.g9b96b494d8c


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/6] fast-export: factor out anonymized_entry creation
  2023-03-22 17:36 [PATCH 0/6] minor cleanups to fast-export --anonymize Jeff King
  2023-03-22 17:37 ` [PATCH 1/6] fast-export: drop const when storing anonymized values Jeff King
  2023-03-22 17:38 ` [PATCH 2/6] fast-export: simplify initialization of anonymized hashmaps Jeff King
@ 2023-03-22 17:40 ` Jeff King
  2023-03-22 17:42 ` [PATCH 4/6] fast-export: de-obfuscate --anonymize-map handling Jeff King
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Jeff King @ 2023-03-22 17:40 UTC (permalink / raw)
  To: git

When anonymizing output, there's only one spot where we generate new
entries to add to our hashmap: when anonymize_str() doesn't find an
entry, we use the generate() callback to make one and add it. Let's pull
that into its own function in preparation for another caller.

Note that we'll add one extra feature. In anonymize_str(), we know that
we won't find an existing entry in the hashmap (since it will only try
to add after failing to find one). But other callers won't have the same
behavior, so we should catch this case and free the now-dangling entry.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/fast-export.c | 35 ++++++++++++++++++++++++++---------
 1 file changed, 26 insertions(+), 9 deletions(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index ba9ab3a97e5..5a0b63219ad 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -139,6 +139,29 @@ static int anonymized_entry_cmp(const void *cmp_data UNUSED,
 	return strcmp(a->orig, b->orig);
 }
 
+static struct anonymized_entry *add_anonymized_entry(struct hashmap *map,
+						     unsigned hash,
+						     const char *orig, size_t len,
+						     char *anon)
+{
+	struct anonymized_entry *ret, *old;
+
+	if (!map->cmpfn)
+		hashmap_init(map, anonymized_entry_cmp, NULL, 0);
+
+	FLEX_ALLOC_MEM(ret, orig, orig, len);
+	hashmap_entry_init(&ret->hash, hash);
+	ret->anon = anon;
+	old = hashmap_put_entry(map, ret, hash);
+
+	if (old) {
+		free(old->anon);
+		free(old);
+	}
+
+	return ret;
+}
+
 /*
  * Basically keep a cache of X->Y so that we can repeatedly replace
  * the same anonymized string with another. The actual generation
@@ -164,15 +187,9 @@ static const char *anonymize_str(struct hashmap *map,
 		ret = hashmap_get_entry(map, &key, hash, &key);
 
 	/* ...and finally generate a new mapping if necessary */
-	if (!ret) {
-		if (!map->cmpfn)
-			hashmap_init(map, anonymized_entry_cmp, NULL, 0);
-
-		FLEX_ALLOC_MEM(ret, orig, orig, len);
-		hashmap_entry_init(&ret->hash, key.hash.hash);
-		ret->anon = generate(data);
-		hashmap_put(map, &ret->hash);
-	}
+	if (!ret)
+		ret = add_anonymized_entry(map, key.hash.hash,
+					   orig, len, generate(data));
 
 	return ret->anon;
 }
-- 
2.40.0.595.g9b96b494d8c


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 4/6] fast-export: de-obfuscate --anonymize-map handling
  2023-03-22 17:36 [PATCH 0/6] minor cleanups to fast-export --anonymize Jeff King
                   ` (2 preceding siblings ...)
  2023-03-22 17:40 ` [PATCH 3/6] fast-export: factor out anonymized_entry creation Jeff King
@ 2023-03-22 17:42 ` Jeff King
  2023-03-22 17:42 ` [PATCH 5/6] fast-export: drop data parameter from anonymous generators Jeff King
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Jeff King @ 2023-03-22 17:42 UTC (permalink / raw)
  To: git

When we handle an --anonymize-map option, we parse the orig/anon pair,
and then feed the "orig" string to anonymize_str(), along with a
generator function that duplicates the "anon" string to be cached in the
map.

This works, because anonymize_str() says "ah, there is no mapping yet
for orig; I'll add one from the generator". But there are some
downsides:

  1. It's a bit too clever, as it's not obvious what the code is trying
     to do or why it works.

  2. It requires allowing generator functions to take an extra void
     pointer, which is not something any of the normal callers of
     anonymize_str() want.

  3. It does the wrong thing if the same token is provided twice.
     When there are conflicting options, like:

       git fast-export --anonymize \
         --anonymize-map=foo:one \
	 --anonymize-map=foo:two

     we usually let the second one override the first. But by using
     anonymize_str(), which has first-one-wins logic, we do the
     opposite.

So instead of relying on anonymize_str(), let's directly add the entry
ourselves. We can tweak the tests to show that we handle overridden
options correctly now.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/fast-export.c            | 8 ++------
 t/t9351-fast-export-anonymize.sh | 2 ++
 2 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 5a0b63219ad..d1e7b26dc64 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -1140,11 +1140,6 @@ static void handle_deletes(void)
 	}
 }
 
-static char *anonymize_seed(void *data)
-{
-	return xstrdup(data);
-}
-
 static int parse_opt_anonymize_map(const struct option *opt,
 				   const char *arg, int unset)
 {
@@ -1166,7 +1161,8 @@ static int parse_opt_anonymize_map(const struct option *opt,
 	if (!keylen || !*value)
 		return error(_("--anonymize-map token cannot be empty"));
 
-	anonymize_str(map, anonymize_seed, arg, keylen, (void *)value);
+	add_anonymized_entry(map, memhash(arg, keylen), arg, keylen,
+			     xstrdup(value));
 
 	return 0;
 }
diff --git a/t/t9351-fast-export-anonymize.sh b/t/t9351-fast-export-anonymize.sh
index 77047e250dc..156a6474847 100755
--- a/t/t9351-fast-export-anonymize.sh
+++ b/t/t9351-fast-export-anonymize.sh
@@ -25,6 +25,7 @@ test_expect_success 'setup simple repo' '
 test_expect_success 'export anonymized stream' '
 	git fast-export --anonymize --all \
 		--anonymize-map=retain-me \
+		--anonymize-map=xyzzy:should-not-appear \
 		--anonymize-map=xyzzy:custom-name \
 		--anonymize-map=other \
 		>stream
@@ -41,6 +42,7 @@ test_expect_success 'stream omits path names' '
 
 test_expect_success 'stream contains user-specified names' '
 	grep retain-me stream &&
+	! grep should-not-appear stream &&
 	grep custom-name stream
 '
 
-- 
2.40.0.595.g9b96b494d8c


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 5/6] fast-export: drop data parameter from anonymous generators
  2023-03-22 17:36 [PATCH 0/6] minor cleanups to fast-export --anonymize Jeff King
                   ` (3 preceding siblings ...)
  2023-03-22 17:42 ` [PATCH 4/6] fast-export: de-obfuscate --anonymize-map handling Jeff King
@ 2023-03-22 17:42 ` Jeff King
  2023-03-22 17:43 ` [PATCH 6/6] fast-export: drop unused parameter from anonymize_commit_message() Jeff King
  2023-03-24 17:40 ` [PATCH 0/6] minor cleanups to fast-export --anonymize Derrick Stolee
  6 siblings, 0 replies; 9+ messages in thread
From: Jeff King @ 2023-03-22 17:42 UTC (permalink / raw)
  To: git

The anonymization code has a specific generator callback for each type
of data (e.g., one for paths, one for oids, and so on). These all take a
"data" parameter, but none of them use it for anything. Which is not
surprising, as the point is to generate a new name independent of any
input, and each function keeps its own static counter.

We added the extra pointer in d5bf91fde44 (fast-export: add a "data"
callback parameter to anonymize_str(), 2020-06-23) to handle
--anonymize-map parsing, but that turned out to be awkward itself, and
was recently dropped.

So let's get rid of this "data" parameter that nobody is using, both
from the generators and from anonymize_str() which plumbed it through.
This simplifies the code, and makes -Wunused-parameter happier.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/fast-export.c | 27 +++++++++++++--------------
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index d1e7b26dc64..12adf75964c 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -168,9 +168,8 @@ static struct anonymized_entry *add_anonymized_entry(struct hashmap *map,
  * is farmed out to the generate function.
  */
 static const char *anonymize_str(struct hashmap *map,
-				 char *(*generate)(void *),
-				 const char *orig, size_t len,
-				 void *data)
+				 char *(*generate)(void),
+				 const char *orig, size_t len)
 {
 	struct anonymized_entry_key key;
 	struct anonymized_entry *ret;
@@ -189,7 +188,7 @@ static const char *anonymize_str(struct hashmap *map,
 	/* ...and finally generate a new mapping if necessary */
 	if (!ret)
 		ret = add_anonymized_entry(map, key.hash.hash,
-					   orig, len, generate(data));
+					   orig, len, generate());
 
 	return ret->anon;
 }
@@ -202,12 +201,12 @@ static const char *anonymize_str(struct hashmap *map,
  */
 static void anonymize_path(struct strbuf *out, const char *path,
 			   struct hashmap *map,
-			   char *(*generate)(void *))
+			   char *(*generate)(void))
 {
 	while (*path) {
 		const char *end_of_component = strchrnul(path, '/');
 		size_t len = end_of_component - path;
-		const char *c = anonymize_str(map, generate, path, len, NULL);
+		const char *c = anonymize_str(map, generate, path, len);
 		strbuf_addstr(out, c);
 		path = end_of_component;
 		if (*path)
@@ -382,7 +381,7 @@ static void print_path_1(const char *path)
 		printf("%s", path);
 }
 
-static char *anonymize_path_component(void *data)
+static char *anonymize_path_component(void)
 {
 	static int counter;
 	struct strbuf out = STRBUF_INIT;
@@ -404,7 +403,7 @@ static void print_path(const char *path)
 	}
 }
 
-static char *generate_fake_oid(void *data)
+static char *generate_fake_oid(void)
 {
 	static uint32_t counter = 1; /* avoid null oid */
 	const unsigned hashsz = the_hash_algo->rawsz;
@@ -420,7 +419,7 @@ static const char *anonymize_oid(const char *oid_hex)
 {
 	static struct hashmap objs;
 	size_t len = strlen(oid_hex);
-	return anonymize_str(&objs, generate_fake_oid, oid_hex, len, NULL);
+	return anonymize_str(&objs, generate_fake_oid, oid_hex, len);
 }
 
 static void show_filemodify(struct diff_queue_struct *q,
@@ -517,7 +516,7 @@ static const char *find_encoding(const char *begin, const char *end)
 	return bol;
 }
 
-static char *anonymize_ref_component(void *data)
+static char *anonymize_ref_component(void)
 {
 	static int counter;
 	struct strbuf out = STRBUF_INIT;
@@ -563,7 +562,7 @@ static char *anonymize_commit_message(const char *old)
 	return xstrfmt("subject %d\n\nbody\n", counter++);
 }
 
-static char *anonymize_ident(void *data)
+static char *anonymize_ident(void)
 {
 	static int counter;
 	struct strbuf out = STRBUF_INIT;
@@ -606,7 +605,7 @@ static void anonymize_ident_line(const char **beg, const char **end)
 
 		len = split.mail_end - split.name_begin;
 		ident = anonymize_str(&idents, anonymize_ident,
-				      split.name_begin, len, NULL);
+				      split.name_begin, len);
 		strbuf_addstr(out, ident);
 		strbuf_addch(out, ' ');
 		strbuf_add(out, split.date_begin, split.tz_end - split.date_begin);
@@ -747,7 +746,7 @@ static void handle_commit(struct commit *commit, struct rev_info *rev,
 	show_progress();
 }
 
-static char *anonymize_tag(void *data)
+static char *anonymize_tag(void)
 {
 	static int counter;
 	struct strbuf out = STRBUF_INIT;
@@ -809,7 +808,7 @@ static void handle_tag(const char *name, struct tag *tag)
 		if (message) {
 			static struct hashmap tags;
 			message = anonymize_str(&tags, anonymize_tag,
-						message, message_size, NULL);
+						message, message_size);
 			message_size = strlen(message);
 		}
 	}
-- 
2.40.0.595.g9b96b494d8c


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 6/6] fast-export: drop unused parameter from anonymize_commit_message()
  2023-03-22 17:36 [PATCH 0/6] minor cleanups to fast-export --anonymize Jeff King
                   ` (4 preceding siblings ...)
  2023-03-22 17:42 ` [PATCH 5/6] fast-export: drop data parameter from anonymous generators Jeff King
@ 2023-03-22 17:43 ` Jeff King
  2023-03-24 17:40 ` [PATCH 0/6] minor cleanups to fast-export --anonymize Derrick Stolee
  6 siblings, 0 replies; 9+ messages in thread
From: Jeff King @ 2023-03-22 17:43 UTC (permalink / raw)
  To: git

As the comment above the function indicates, we do not bother actually
storing commit messages in our anonymization map. But we still take the
message as a parameter, and just ignore it. Let's stop doing that, which
will make -Wunused-parameter happier.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/fast-export.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 12adf75964c..f3cc5486861 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -556,7 +556,7 @@ static const char *anonymize_refname(const char *refname)
  * We do not even bother to cache commit messages, as they are unlikely
  * to be repeated verbatim, and it is not that interesting when they are.
  */
-static char *anonymize_commit_message(const char *old)
+static char *anonymize_commit_message(void)
 {
 	static int counter;
 	return xstrfmt("subject %d\n\nbody\n", counter++);
@@ -683,7 +683,7 @@ static void handle_commit(struct commit *commit, struct rev_info *rev,
 
 	mark_next_object(&commit->object);
 	if (anonymize) {
-		reencoded = anonymize_commit_message(message);
+		reencoded = anonymize_commit_message();
 	} else if (encoding) {
 		switch(reencode_mode) {
 		case REENCODE_YES:
-- 
2.40.0.595.g9b96b494d8c

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/6] minor cleanups to fast-export --anonymize
  2023-03-22 17:36 [PATCH 0/6] minor cleanups to fast-export --anonymize Jeff King
                   ` (5 preceding siblings ...)
  2023-03-22 17:43 ` [PATCH 6/6] fast-export: drop unused parameter from anonymize_commit_message() Jeff King
@ 2023-03-24 17:40 ` Derrick Stolee
  2023-03-24 19:46   ` Junio C Hamano
  6 siblings, 1 reply; 9+ messages in thread
From: Derrick Stolee @ 2023-03-24 17:40 UTC (permalink / raw)
  To: Jeff King, git

On 3/22/2023 1:36 PM, Jeff King wrote:
> My goal here was cleaning up some -Wunused-parameter warnings, but doing
> so required a few preparatory cleanups, one of which actually fixes a
> really minor bug (in patch 4).

I saw this was on its way to next(?) without any comment, but it
really doesn't need any. Each patch was clear and created a self-
sufficient reason, but the culmination of dropping parameters and
clearing up memory is also good.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/6] minor cleanups to fast-export --anonymize
  2023-03-24 17:40 ` [PATCH 0/6] minor cleanups to fast-export --anonymize Derrick Stolee
@ 2023-03-24 19:46   ` Junio C Hamano
  0 siblings, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2023-03-24 19:46 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Jeff King, git

Derrick Stolee <derrickstolee@github.com> writes:

> I saw this was on its way to next(?) without any comment, but it
> really doesn't need any. Each patch was clear and created a self-
> sufficient reason, but the culmination of dropping parameters and
> clearing up memory is also good.

I also thought these were self evidently ready for 'next'.  Thanks
for a positive confirmation.


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-03-24 19:47 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-22 17:36 [PATCH 0/6] minor cleanups to fast-export --anonymize Jeff King
2023-03-22 17:37 ` [PATCH 1/6] fast-export: drop const when storing anonymized values Jeff King
2023-03-22 17:38 ` [PATCH 2/6] fast-export: simplify initialization of anonymized hashmaps Jeff King
2023-03-22 17:40 ` [PATCH 3/6] fast-export: factor out anonymized_entry creation Jeff King
2023-03-22 17:42 ` [PATCH 4/6] fast-export: de-obfuscate --anonymize-map handling Jeff King
2023-03-22 17:42 ` [PATCH 5/6] fast-export: drop data parameter from anonymous generators Jeff King
2023-03-22 17:43 ` [PATCH 6/6] fast-export: drop unused parameter from anonymize_commit_message() Jeff King
2023-03-24 17:40 ` [PATCH 0/6] minor cleanups to fast-export --anonymize Derrick Stolee
2023-03-24 19:46   ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).