git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: <rsbecker@nexbridge.com>
To: "'Ævar Arnfjörð Bjarmason'" <avarab@gmail.com>,
	"'Florine W. Dekker'" <florine@fwdekker.com>
Cc: "'René Scharfe'" <l.s.r@web.de>,
	git@vger.kernel.org,
	"'brian m . carlson'" <sandals@crustytoothpaste.net>
Subject: RE: Wildcards in mailmap to hide transgender people's deadnames
Date: Mon, 19 Sep 2022 08:27:53 -0400	[thread overview]
Message-ID: <004801d8cc23$41216960$c3643c20$@nexbridge.com> (raw)
In-Reply-To: <220919.86mtav60wi.gmgdl@evledraar.gmail.com>

On September 19, 2022 7:20 AM, Ævar Arnfjörð Bjarmason wrote:
>On Wed, Sep 14 2022, Florine W. Dekker wrote:
>
>> On 14/09/2022 09:40, René Scharfe wrote:
>>> Am 13.09.22 um 23:53 schrieb Florine W. Dekker:
>>>> Now, John can now add the following line to their mailmap config:
>>>> `John Doe <john.doe@example.com> <\*.doe@example.com>`, which does
>>>> not reveal their old name.
>>> That would falsely attribute the work of possible future developers
>>> ann.doe@example.com and bob.doe@example.com to John as well.
>
>First, I'm very happy to see that someone has picked up the thread on this again.
>
>> Good point. I assumed such false positives would be unlikely because I
>> was considering very-small-scale projects, but I agree that using
>> wildcards is not at all feasible for larger projects.
>
>Yes, please, making the mapping fuzzy in any way is really going against the core
>design of the mailmap mechanism, it should be unambiguous,
>*also* for commits going forward.
>
>>> Supporting hashed entries would allow for a more targeted obfuscation.
>>> That was discussed a while ago:
>>> https://lore.kernel.org/git/20210103211849.2691287-1-sandals@crustyto
>>> othpaste.net/
>>
>> That was an interesting read. I agree with Ævar in that thread in that
>> I think URL encoding is sufficient. I think it meets Brian's use case
>> of never having to see the old name again, and my use case of
>> obfuscating it from accidental discovery by friendly collaborators.
>
>The question that was left open in my mind after that previous discussion was
>weather people who wanted the "deadname" feature would find this acceptable,
>I don't think we got any explicit ACK/NACK on that (but I may be misrecalling, and
>didn't go back & re-read the whole thing).
>
>I'm happy that there's at least one ACK to it here in the form of your reply, and
>hopefully that represents what a wider audience would prefer.
>
>> While a hash certainly gives a stronger sense of security, I think
>> it's a false sense of security, because, as you note below, recovering
>> old email addresses from the tree is not much more trivial than
>> reversing the encoding. And either way, a sha256 hash can easily be
>> inverted in a few days(?) using a dictionary attack with email
>> addresses from data breaches.
>
>It's going to be "milliseconds", not "days". Brute-forcing a SHA-256 to find an
>unknown E-Mail address might take longer, but by definition for a .mailmap entry
>you already have both sides.
>
>So "brute-forcing" is just a matter of hashing authors & E-Mails in our history, and
>seeing if they correspond to .mailmap entries.
>
>> As someone who has changed her name, I would be content with using a
>> simple URL encoding.
>
>I'd be happy to have that as a feature, in particular because (as I pointed out in the
>previous discussion) it has a large use-case outside of this .mailmap topic, namely
>wanting to map e.g. mis-encoded author names in past commits to the right
>encoding (which I've personally had some use-cases for).
>
>There might be other "bonus" use-cases I've missed. E.g. is ">" or "<"
>allowed in obscure E-Mail addresses (maybe within quotes?), our current parser
>would barf on it, but being able to URI-encode it would work around that. I don't
>know offhand to what extent there's an overlap with various RFC-pedantic E-Mail
>addresses one could come up with, and what we'd accept in commit objects with
>"fsck".
>
>In any case, I think that an implementation of this & patch to
>gitmailmap(5) should explain this sort of feature in those terms. If some people
>then find it useful to encode things in the ASCII-space for some reason (e.g. the
>social "deadname" reason) that would also be useful.
>
>But in terms the docs I don't think it should be documented in that way. Git just
>needs to provide the feature, we don't need to dictate how & why someone
>might use it.
>
>>> [...]
>>>     $ git log --format='%ae %aE' |
>>>       awk '$1 != $2 && !a[$0] {a[$0] = 1; print}' |
>>>       grep -F l.s.r@web.de
>>>     rene.scharfe@lsrfire.ath.cx l.s.r@web.de
>>>
>>> The same can be done with names (%an/%aN).
>>
>> You're absolutely right. With "advanced tools" I was referring to
>> anything more advanced than a plain `git log` ;-)
>
>The thing that still makes me a bit nervous on this topic is that we need to make it
>really clear that we're *not* providing some promise of obscuring these values
>going forward, but just providing a feature that some people might rely on as a
>combined social mechanism, and with the assumption that the defaults of the "git
>log" view are unlikely to change.
>
>I.e. I think a "deadname" use-case of this would probably:
>
>* Have some comment at the top of .mailmap about why some values are
>  over-encoded (or perhaps it would be obvious to everyone working on
>  that repo why someone was encoding the "plain ASCII" A-Za-z0-9 space).
>
>* Use the default "git log" view, where we happen to map these (given
>  the right options, config etc.)
>
>But should not:
>
>* Assume that other tools such as "fsck", "check-mailmap" or even "log"
>  won't have future features that make de-obscuring these values easier,
>  or something that's part of a normal workflow.
>
>  E.g. I've wanted a "fsck for mailmap" for a while, i.e. to scan the
>  file, parse our history, and see which entries are redundant or even
>  potentially missing (based on e.g. names matching, but having
>  different E-Mail addresses).
>
>  It would be hard not to de-obscure URI encoded values for some
>  features like that, e.g. if "log" adds the ability to say "this name X
>  was mapped from Y".
>
>* In general pretend that the mailmap is anything but a *public* and
>  easily readable mapping. It's inherent in the feature that the
>  consumer of it will know that X used to be Y.
>
>The last thing we want is to create some feature that effectively ends up being
>some self-doxxing (or self-"de-deadnaming"?) mechanism, because we've left a
>gap between user expectations and what we can realistically provide.

As a side topic, which I brought up about 2 years ago, there are other reasons to do this, including GDPR-like rules, to obfuscate identity information. A solution to obfuscation could provide a mechanism to change the attribution. My team has experience in this domain. Do we want to reopen that discussion?

-Randall


  reply	other threads:[~2022-09-19 12:28 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-13 21:53 Wildcards in mailmap to hide transgender people's deadnames Florine W. Dekker
2022-09-14  7:40 ` René Scharfe
2022-09-14  9:07   ` Florine W. Dekker
2022-09-19 11:20     ` Ævar Arnfjörð Bjarmason
2022-09-19 12:27       ` rsbecker [this message]
2022-09-19 15:19       ` brian m. carlson
2022-09-19 16:31         ` Junio C Hamano
2022-09-19 17:26           ` brian m. carlson
2022-09-20 10:23         ` Ævar Arnfjörð Bjarmason
2022-09-20 14:58           ` Florine W. Dekker
2022-09-21 16:42           ` Junio C Hamano
2022-09-26  9:14             ` Ævar Arnfjörð Bjarmason
     [not found]   ` <CANgJU+Wt_yjv1phwiSUtLLZ=JKA9LvS=0UcBYNu+nxdJ_7d_Ew@mail.gmail.com>
2022-09-16 16:59     ` Florine W. Dekker
2022-09-20  0:32       ` brian m. carlson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='004801d8cc23$41216960$c3643c20$@nexbridge.com' \
    --to=rsbecker@nexbridge.com \
    --cc=avarab@gmail.com \
    --cc=florine@fwdekker.com \
    --cc=git@vger.kernel.org \
    --cc=l.s.r@web.de \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).