git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Philip Oakley <philipoakley@iee.org>
Cc: Junio C Hamano <gitster@pobox.com>,
	git@vger.kernel.org, Duy Nguyen <pclouds@gmail.com>
Subject: Re: [PATCH v2] teach fast-export an --anonymize option
Date: Sat, 23 Aug 2014 02:19:32 -0400	[thread overview]
Message-ID: <20140823061932.GB18256@peff.net> (raw)
In-Reply-To: <9E69746EFF0048FF84631A9794206566@PhilipOakley>

On Fri, Aug 22, 2014 at 07:39:59PM +0100, Philip Oakley wrote:

> Just a bit of bikeshedding for future improvements..
> 
> The .gitignore is another potential user problem area that may benefit form
> not being anonymised when problems strike.

Thanks, I had meant to mention some implications for .gitmodules here,
but forgot about .gitignore (and .gitattributes!).

For any git-specific files like this, we have two challenges:

  1. We've munged their filenames (so .gitignore is probably path123
     now).

  2. We'll have munged their contents. So even if we left the file as
     .gitignore, it will have junk in it.

Fixing (1) is pretty easy. I structured all of the anonymizing functions
to take the old values, even though most of them just throw it away
entirely (which is a good way to be sure you're not leaking anything!).
But we could pass through a few specific ones.

However, that doesn't help us if the contents are still munged (in fact
it's worse, because git will be annoyed that your .gitmodules file
contains unparseable crap). So how do we munge those files?

It depends on the individual file, I think, and what the user wants to
protect.

For .gitignore and .gitattributes, we can translate the pathnames
contained in the file. But that doesn't work in the general case,
because the file could have wildcards or other non-literal syntax.

For .gitmodules, I think it's all-or-nothing. Either the user is OK
sharing the URLs of their submodules or not (we could munge _just_ the
URLs, but it's not like the result would be remotely functional).

So while we might be able to get some things working on the .gitignore
side, I kind of think the simplest way forward is just adding finer
granularity for the user. Let them say "my filenames are OK to share
because they're part of the problem, but just make sure you hide my
commit messages and file contents".

And then if you're not munging filenames, we would turn off .gitignore
and .gitattributes munging. The implementation is not too hard.
export_blob does not have the path of the blob, but we generate the list
of blobs to export from a diff, so we can feed the path that way.  That
technically misses a case where you have a blob at path "X", we
anonymize it, and then you later move it to ".gitignore", which would
not be anonymized. But that is unlikely enough that it is probably not
worth worrying about.

> For example, there's a current
> problem on the git-users list
> https://groups.google.com/forum/#!topic/git-users/JJFIEsI5HRQ about "git
> clean vs git status re .gitignore", which would then also beg questions
> about retaining file extensions/suffixes (.txt, .o, .c, etc).

Yeah, I think retaining extensions would be a reasonable option (and you
would probably use it with an option to retain .gitattributes or
.gitignore whole if you were confident that those files did not have
anything private and just used extension wildcards).

> One thought is that the user should be able to, as an option, select the
> number of initial characters retained from filenames, and similarly, the
> option to retain the file extension, and possibly directory names, such that
> the full .gitignore still works in most cases, and the sort order works (as
> far as it goes on number of characters).

Yeah, those all seem reasonable.

> All things for future improvers to consider.

Agreed. I wanted to go through your list not because I want to implement
any of those things right now, but because I wanted to make sure that
there was nothing in my approach that would preclude us from building
those things later. And I don't think there is (and I'd be happy if
somebody else felt like building them on top, now or later).

-Peff

  reply	other threads:[~2014-08-23  6:19 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-21  7:01 [PATCH] teach fast-export an --anonymize option Jeff King
2014-08-21 20:15 ` Junio C Hamano
2014-08-21 22:41   ` Jeff King
2014-08-21 21:57 ` Junio C Hamano
2014-08-21 22:49   ` Jeff King
2014-08-21 23:21     ` [PATCH v2] " Jeff King
2014-08-22 13:06       ` Duy Nguyen
2014-08-22 18:39       ` Philip Oakley
2014-08-23  6:19         ` Jeff King [this message]
2014-08-27 16:01       ` Junio C Hamano
2014-08-27 16:58         ` Jeff King
2014-08-27 17:01           ` [PATCH v3] " Jeff King
2014-08-28 10:30             ` Duy Nguyen
2014-08-28 12:32               ` Jeff King
2014-08-28 16:46                 ` Ramsay Jones
2014-08-28 18:43                   ` Junio C Hamano
2014-08-28 18:50                   ` Jeff King
2014-08-28 18:11                 ` Junio C Hamano
2014-08-28 19:04                   ` Jeff King
2014-08-31 10:34                 ` Eric Sunshine
2014-08-31 15:53                   ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140823061932.GB18256@peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=pclouds@gmail.com \
    --cc=philipoakley@iee.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).