git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, Duy Nguyen <pclouds@gmail.com>
Subject: Re: [PATCH] teach fast-export an --anonymize option
Date: Thu, 21 Aug 2014 18:41:50 -0400	[thread overview]
Message-ID: <20140821224150.GA21105@peff.net> (raw)
In-Reply-To: <xmqqmwaxr44x.fsf@gitster.dls.corp.google.com>

On Thu, Aug 21, 2014 at 01:15:10PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > +/*
> > + * We anonymize each component of a path individually,
> > + * so that paths a/b and a/c will share a common root.
> > + * The paths are cached via anonymize_mem so that repeated
> > + * lookups for "a" will yield the same value.
> > + */
> > +static void anonymize_path(struct strbuf *out, const char *path,
> > +			   struct hashmap *map,
> > +			   char *(*generate)(const char *, size_t *))
> > +{
> > +	while (*path) {
> > +		const char *end_of_component = strchrnul(path, '/');
> > +		size_t len = end_of_component - path;
> > +		const char *c = anonymize_mem(map, generate, path, &len);
> > +		strbuf_add(out, c, len);
> > +		path = end_of_component;
> > +		if (*path)
> > +			strbuf_addch(out, *path++);
> > +	}
> > +}
> 
> Do two paths sort the same way before and after anonymisation?  For
> example, if generate() works as a simple substitution, it should map
> a character that sorts before (or after) '/' with another that also
> sorts before (or after) '/' for us to be able to diagnose an error
> that comes from D/F sort order confusion.

No, the sort order is totally lost. I'd be afraid that a general scheme
would end up leaking information about what was in the filenames. It
might be acceptable to leak some information here, though, if it adds to
the realism of the result.

I tried here to lay the basic infrastructure and do the simplest thing
that might work, so we could evaluate proposals like that independently
(and also because I didn't come up with a clever enough algorithm to do
what you're asking).  Patches welcome on top. :)

-Peff

  reply	other threads:[~2014-08-21 22:41 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-21  7:01 [PATCH] teach fast-export an --anonymize option Jeff King
2014-08-21 20:15 ` Junio C Hamano
2014-08-21 22:41   ` Jeff King [this message]
2014-08-21 21:57 ` Junio C Hamano
2014-08-21 22:49   ` Jeff King
2014-08-21 23:21     ` [PATCH v2] " Jeff King
2014-08-22 13:06       ` Duy Nguyen
2014-08-22 18:39       ` Philip Oakley
2014-08-23  6:19         ` Jeff King
2014-08-27 16:01       ` Junio C Hamano
2014-08-27 16:58         ` Jeff King
2014-08-27 17:01           ` [PATCH v3] " Jeff King
2014-08-28 10:30             ` Duy Nguyen
2014-08-28 12:32               ` Jeff King
2014-08-28 16:46                 ` Ramsay Jones
2014-08-28 18:43                   ` Junio C Hamano
2014-08-28 18:50                   ` Jeff King
2014-08-28 18:11                 ` Junio C Hamano
2014-08-28 19:04                   ` Jeff King
2014-08-31 10:34                 ` Eric Sunshine
2014-08-31 15:53                   ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140821224150.GA21105@peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=pclouds@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).