git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Deadname rewriting
@ 2019-06-15  1:54 Phil Hord
  2019-06-15  7:27 ` Andreas Schwab
  2019-06-15  8:19 ` Ævar Arnfjörð Bjarmason
  0 siblings, 2 replies; 8+ messages in thread
From: Phil Hord @ 2019-06-15  1:54 UTC (permalink / raw)
  To: Git

I know name-scrubbing is already covered in filter-branch and other
places. But we have a scenario becoming more common that makes it a
more sensitive topic.

At $work we have a long time employee who has changed their name from
Alice to Bob.  Bob doesn't want anyone to call him "Alice" anymore and
is prone to be offended if they do.  This is called "deadnaming".

We are able to convince most of our work tools to expunge the deadname
from usage anywhere, but git stubbornly calls Bob "Alice" whenever
someone asks for "git blame" or checks in "git log".

We could rewrite history with filter-branch, but that's quite
disruptive.  I found some alternatives.

.mailmap seems perfect for this task, but it doesn't work everywhere
(blame, log, etc.).  Also, it requires the deadname to be forever
proclaimed in the .mailmap file itself.

`git replace` works rather nicely, except all of Bob's old commits
show "replaced" in the decorator list. Also, it doesn't propagate well
from the central server since `refs/replaces` namespace isn't fetched
normally.  But in case anyone wants it, here's what I did:

git log --author=alice.smith --format="%h" --all |
   while read hash ; do
      GIT_EDITOR='sed -i -e s/Alice Smith/Bob Smith/g' -e
's/alice.smith/bob.smith/' \
      git replace --edit $hash
   done
git push origin 'refs/replace/*:refs/replace/*'

I'd quite like the .mailmap solution to work, and I might flesh that
out that some day.

It feels like `.git/info/grafts` would work the best if it could be
distributed with the project, but I'm pretty sure that's a non-starter
for many reasons.

Any other ideas?  Has anyone here encountered this already?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Deadname rewriting
  2019-06-15  1:54 Deadname rewriting Phil Hord
@ 2019-06-15  7:27 ` Andreas Schwab
  2019-06-15  8:19 ` Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 8+ messages in thread
From: Andreas Schwab @ 2019-06-15  7:27 UTC (permalink / raw)
  To: Phil Hord; +Cc: Git

On Jun 14 2019, Phil Hord <phil.hord@gmail.com> wrote:

> It feels like `.git/info/grafts` would work the best if it could be
> distributed with the project, but I'm pretty sure that's a non-starter
> for many reasons.

The graft file is obsoleted by git replace.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Deadname rewriting
  2019-06-15  1:54 Deadname rewriting Phil Hord
  2019-06-15  7:27 ` Andreas Schwab
@ 2019-06-15  8:19 ` Ævar Arnfjörð Bjarmason
  2019-06-17 21:21   ` Philip Oakley
  2019-06-21 21:12   ` Phil Hord
  1 sibling, 2 replies; 8+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-06-15  8:19 UTC (permalink / raw)
  To: Phil Hord; +Cc: Git, CB Bailey


On Sat, Jun 15 2019, Phil Hord wrote:

> I know name-scrubbing is already covered in filter-branch and other
> places. But we have a scenario becoming more common that makes it a
> more sensitive topic.
>
> At $work we have a long time employee who has changed their name from
> Alice to Bob.  Bob doesn't want anyone to call him "Alice" anymore and
> is prone to be offended if they do.  This is called "deadnaming".
>
> We are able to convince most of our work tools to expunge the deadname
> from usage anywhere, but git stubbornly calls Bob "Alice" whenever
> someone asks for "git blame" or checks in "git log".
>
> We could rewrite history with filter-branch, but that's quite
> disruptive.  I found some alternatives.
>
> .mailmap seems perfect for this task, but it doesn't work everywhere
> (blame, log, etc.).  Also, it requires the deadname to be forever
> proclaimed in the .mailmap file itself.
>
> `git replace` works rather nicely, except all of Bob's old commits
> show "replaced" in the decorator list. Also, it doesn't propagate well
> from the central server since `refs/replaces` namespace isn't fetched
> normally.  But in case anyone wants it, here's what I did:
>
> git log --author=alice.smith --format="%h" --all |
>    while read hash ; do
>       GIT_EDITOR='sed -i -e s/Alice Smith/Bob Smith/g' -e
> 's/alice.smith/bob.smith/' \
>       git replace --edit $hash
>    done
> git push origin 'refs/replace/*:refs/replace/*'
>
> I'd quite like the .mailmap solution to work, and I might flesh that
> out that some day.
>
> It feels like `.git/info/grafts` would work the best if it could be
> distributed with the project, but I'm pretty sure that's a non-starter
> for many reasons.
>
> Any other ideas?  Has anyone here encountered this already?

What should be done is to extend the .mailmap support to other
cases. I.e. make tools like blame, shortlog etc. show the equivalent of
%aN and %aE by default.

This topic was discussed at the last git contributor summit (brought up
by CB Bailey) resulting in this patch, which I see didn't make it in &
needs to be resurrected again:
https://public-inbox.org/git/20181212171052.13415-1-cb@hashpling.org/

So, patches welcome :)

What's not going to be supported is some notion of 100% forgetting that
there was ever an Alice that's now called Bob. They did in fact create
commit objects with "Alice" in them, and low-level plumbing like
"cat-file -p <commit>" is always going to show that, and there's going
to be the mapping in .mailmap.

But as far as porcelain UI things that would show the mailmapped value
goes those can be made to always show "Bob".

Unless of course your $work is willing to completely rewrite the repo...

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Deadname rewriting
  2019-06-15  8:19 ` Ævar Arnfjörð Bjarmason
@ 2019-06-17 21:21   ` Philip Oakley
  2019-06-17 22:33     ` Ævar Arnfjörð Bjarmason
  2019-06-21 21:12   ` Phil Hord
  1 sibling, 1 reply; 8+ messages in thread
From: Philip Oakley @ 2019-06-17 21:21 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Phil Hord; +Cc: Git, CB Bailey

On 15/06/2019 09:19, Ævar Arnfjörð Bjarmason wrote:
> On Sat, Jun 15 2019, Phil Hord wrote:
>
>> I know name-scrubbing is already covered in filter-branch and other
>> places. But we have a scenario becoming more common that makes it a
>> more sensitive topic.
>>
>> At $work we have a long time employee who has changed their name from
>> Alice to Bob.  Bob doesn't want anyone to call him "Alice" anymore and
>> is prone to be offended if they do.  This is called "deadnaming".
>>
>> We are able to convince most of our work tools to expunge the deadname
>> from usage anywhere, but git stubbornly calls Bob "Alice" whenever
>> someone asks for "git blame" or checks in "git log".
>>
>> We could rewrite history with filter-branch, but that's quite
>> disruptive.  I found some alternatives.
>>
>> .mailmap seems perfect for this task, but it doesn't work everywhere
>> (blame, log, etc.).  Also, it requires the deadname to be forever
>> proclaimed in the .mailmap file itself.
>>
>> `git replace` works rather nicely, except all of Bob's old commits
>> show "replaced" in the decorator list. Also, it doesn't propagate well
>> from the central server since `refs/replaces` namespace isn't fetched
>> normally.  But in case anyone wants it, here's what I did:
>>
>> git log --author=alice.smith --format="%h" --all |
>>     while read hash ; do
>>        GIT_EDITOR='sed -i -e s/Alice Smith/Bob Smith/g' -e
>> 's/alice.smith/bob.smith/' \
>>        git replace --edit $hash
>>     done
>> git push origin 'refs/replace/*:refs/replace/*'
>>
>> I'd quite like the .mailmap solution to work, and I might flesh that
>> out that some day.
>>
>> It feels like `.git/info/grafts` would work the best if it could be
>> distributed with the project, but I'm pretty sure that's a non-starter
>> for many reasons.
>>
>> Any other ideas?  Has anyone here encountered this already?
> What should be done is to extend the .mailmap support to other
> cases. I.e. make tools like blame, shortlog etc. show the equivalent of
> %aN and %aE by default.
>
> This topic was discussed at the last git contributor summit (brought up
> by CB Bailey) resulting in this patch, which I see didn't make it in &
> needs to be resurrected again:
> https://public-inbox.org/git/20181212171052.13415-1-cb@hashpling.org/
>
> So, patches welcome :)
>
> What's not going to be supported is some notion of 100% forgetting that
> there was ever an Alice that's now called Bob. They did in fact create
> commit objects with "Alice" in them, and low-level plumbing like
> "cat-file -p <commit>" is always going to show that, and there's going
> to be the mapping in .mailmap.
>
> But as far as porcelain UI things that would show the mailmapped value
> goes those can be made to always show "Bob".
>
> Unless of course your $work is willing to completely rewrite the repo...
This may become a bigger issue for corporates that prevents Git from 
being used because it doesn't handle the _legal requirements_ for proper 
current `known-by:` naming.

I found this [1] on the UK Parliament website that also covers 
'deadnaming', and the potential misunderstandings about what is (and is 
not) a (unnecessary) 'legal name'.

It may be an option for the SHA1 transition to also include, as an 
independent step, the appropriate mailmap conversion for dead-names 
(which is a private document owned by the hosting repo owner - see GDPR 
Data Controller responsibilities).

If author/committer renaming is done as part of a full hash conversion 
(with a golden repo providing hash mapping) then it is less of a problem 
for a one-shot conversion, but still an issue for everyday name changes 
(including those from divorce, adoption, etc). Maybe even convert (swap) 
the ascii/utf-8 names for unique hashes (in the repo) for reverse look 
up of the latest known-by name (getting a bit complicated here)

The distributed nature of the classic Git open source usage may have 
similar issues to that of gmane, where it pulled the hosting of email 
lists. A legal case is likely needed before any level of clarification 
is obtained (which will still have overlaps!)

The mailmap is probably not the right place for holding deadname 
conversions as they should not be public, but it may be a partial 
workaround to reduce visibility of deadnames.

Philip

[1] 
https://publications.parliament.uk/pa/cm201516/cmselect/cmwomeq/390/39009.htm

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Deadname rewriting
  2019-06-17 21:21   ` Philip Oakley
@ 2019-06-17 22:33     ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 8+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-06-17 22:33 UTC (permalink / raw)
  To: Philip Oakley; +Cc: Phil Hord, Git, CB Bailey


On Mon, Jun 17 2019, Philip Oakley wrote:

> On 15/06/2019 09:19, Ævar Arnfjörð Bjarmason wrote:
>> On Sat, Jun 15 2019, Phil Hord wrote:
>>
>>> I know name-scrubbing is already covered in filter-branch and other
>>> places. But we have a scenario becoming more common that makes it a
>>> more sensitive topic.
>>>
>>> At $work we have a long time employee who has changed their name from
>>> Alice to Bob.  Bob doesn't want anyone to call him "Alice" anymore and
>>> is prone to be offended if they do.  This is called "deadnaming".
>>>
>>> We are able to convince most of our work tools to expunge the deadname
>>> from usage anywhere, but git stubbornly calls Bob "Alice" whenever
>>> someone asks for "git blame" or checks in "git log".
>>>
>>> We could rewrite history with filter-branch, but that's quite
>>> disruptive.  I found some alternatives.
>>>
>>> .mailmap seems perfect for this task, but it doesn't work everywhere
>>> (blame, log, etc.).  Also, it requires the deadname to be forever
>>> proclaimed in the .mailmap file itself.
>>>
>>> `git replace` works rather nicely, except all of Bob's old commits
>>> show "replaced" in the decorator list. Also, it doesn't propagate well
>>> from the central server since `refs/replaces` namespace isn't fetched
>>> normally.  But in case anyone wants it, here's what I did:
>>>
>>> git log --author=alice.smith --format="%h" --all |
>>>     while read hash ; do
>>>        GIT_EDITOR='sed -i -e s/Alice Smith/Bob Smith/g' -e
>>> 's/alice.smith/bob.smith/' \
>>>        git replace --edit $hash
>>>     done
>>> git push origin 'refs/replace/*:refs/replace/*'
>>>
>>> I'd quite like the .mailmap solution to work, and I might flesh that
>>> out that some day.
>>>
>>> It feels like `.git/info/grafts` would work the best if it could be
>>> distributed with the project, but I'm pretty sure that's a non-starter
>>> for many reasons.
>>>
>>> Any other ideas?  Has anyone here encountered this already?
>> What should be done is to extend the .mailmap support to other
>> cases. I.e. make tools like blame, shortlog etc. show the equivalent of
>> %aN and %aE by default.
>>
>> This topic was discussed at the last git contributor summit (brought up
>> by CB Bailey) resulting in this patch, which I see didn't make it in &
>> needs to be resurrected again:
>> https://public-inbox.org/git/20181212171052.13415-1-cb@hashpling.org/
>>
>> So, patches welcome :)
>>
>> What's not going to be supported is some notion of 100% forgetting that
>> there was ever an Alice that's now called Bob. They did in fact create
>> commit objects with "Alice" in them, and low-level plumbing like
>> "cat-file -p <commit>" is always going to show that, and there's going
>> to be the mapping in .mailmap.
>>
>> But as far as porcelain UI things that would show the mailmapped value
>> goes those can be made to always show "Bob".
>>
>> Unless of course your $work is willing to completely rewrite the repo...
> This may become a bigger issue for corporates that prevents Git from
> being used because it doesn't handle the _legal requirements_ for
> proper current `known-by:` naming.
>
> I found this [1] on the UK Parliament website that also covers
> 'deadnaming', and the potential misunderstandings about what is (and
> is not) a (unnecessary) 'legal name'.
>
> It may be an option for the SHA1 transition to also include, as an
> independent step, the appropriate mailmap conversion for dead-names
> (which is a private document owned by the hosting repo owner - see
> GDPR Data Controller responsibilities).
>
> If author/committer renaming is done as part of a full hash conversion
> (with a golden repo providing hash mapping) then it is less of a
> problem for a one-shot conversion, but still an issue for everyday
> name changes (including those from divorce, adoption, etc). Maybe even
> convert (swap) the ascii/utf-8 names for unique hashes (in the repo)
> for reverse look up of the latest known-by name (getting a bit
> complicated here)
>
> The distributed nature of the classic Git open source usage may have
> similar issues to that of gmane, where it pulled the hosting of email
> lists. A legal case is likely needed before any level of clarification
> is obtained (which will still have overlaps!)
>
> The mailmap is probably not the right place for holding deadname
> conversions as they should not be public, but it may be a partial
> workaround to reduce visibility of deadnames.
>
> [1]
> https://publications.parliament.uk/pa/cm201516/cmselect/cmwomeq/390/39009.htm

I don't see how tacking this onto the SHA-1->SHA-256 hash transition
could ever work.

Ignoring the issues with how it wouldn't work with the current plan as
designed, you'd have a mostly 1=1 mapping between the two hashes, except
in cases where commits from "Alice" wouldn't 1=1 map, because they'd
been subject to some mailmap munging you didn't have access to to map
them to "Bob".

I'd think that in the context of a non-public work repository you'd just
leak the same information anyway. People wanting to deadname themselves
is relatively rare, so it wouldn't be hard in most cases to infer the
missing data, and you'd be back at square 1.

That, and even if it somehow worked the hash transition is a one-off
thing, and you'd presumably want this on an ongoing basis, so you'd need
some other mechanism.

I think there's certainly a place for better support for gracefully
handling complete history rewriting in a centralized workflow better
within git.

That sort of thing is useful for other things, e.g. you might want to
rewrite out some old big blob in your history, or erase any record that
you used to use an indenting style that's not in fashion.

So basically an improvement to the refs/replaces facility where clients
would opt-in to eagerly forget the old history, and since it your $work
you'd have a central server you could enforce the eventual full
transition, and clients wouldn't get upset about the non-fast-forward.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Deadname rewriting
  2019-06-15  8:19 ` Ævar Arnfjörð Bjarmason
  2019-06-17 21:21   ` Philip Oakley
@ 2019-06-21 21:12   ` Phil Hord
  2019-06-21 21:34     ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 8+ messages in thread
From: Phil Hord @ 2019-06-21 21:12 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Git, CB Bailey

On Sat, Jun 15, 2019 at 1:19 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
> On Sat, Jun 15 2019, Phil Hord wrote:
>
> > At $work we have a long time employee who has changed their name from
> > Alice to Bob.  Bob doesn't want anyone to call him "Alice" anymore and
> > is prone to be offended if they do.  This is called "deadnaming".
...
> What should be done is to extend the .mailmap support to other
> cases. I.e. make tools like blame, shortlog etc. show the equivalent of
> %aN and %aE by default.

It seems that shortlog and blame do use %aE and %aN by default.  Even
log does.  It is only because I didn't know about %aN 10 years ago
that my custom log format does not.

It's a pity the format author has the option to ignore the mailmap. I
think it's a choice commonly made by mistake rather than intention.  I
wonder if anyone would mind a forced-override config.  Maybe a force
flag in the .mailmap file itself.

           <cto@company.xx>                       <cto@coompany.xx>
           Other Author <other@author.xx>   nick2 <bugs@company.xx>
           Alice Doe <alice.doe@myco.com>         <bob.doe@myco.co>  --force


> This topic was discussed at the last git contributor summit (brought up
> by CB Bailey) resulting in this patch, which I see didn't make it in &
> needs to be resurrected again:
> https://public-inbox.org/git/20181212171052.13415-1-cb@hashpling.org/

Thanks for the link.

I didn't know about config options for mailmap.file and log.mailmap
before. These do make this option much more useful, especially when we
can insert default settings for them into /etc/gitconfig across the
company.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Deadname rewriting
  2019-06-21 21:12   ` Phil Hord
@ 2019-06-21 21:34     ` Ævar Arnfjörð Bjarmason
  2019-06-21 22:16       ` CB Bailey
  0 siblings, 1 reply; 8+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-06-21 21:34 UTC (permalink / raw)
  To: Phil Hord; +Cc: Git, CB Bailey


On Fri, Jun 21 2019, Phil Hord wrote:

> On Sat, Jun 15, 2019 at 1:19 AM Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
>> On Sat, Jun 15 2019, Phil Hord wrote:
>>
>> > At $work we have a long time employee who has changed their name from
>> > Alice to Bob.  Bob doesn't want anyone to call him "Alice" anymore and
>> > is prone to be offended if they do.  This is called "deadnaming".
> ...
>> What should be done is to extend the .mailmap support to other
>> cases. I.e. make tools like blame, shortlog etc. show the equivalent of
>> %aN and %aE by default.
>
> It seems that shortlog and blame do use %aE and %aN by default.  Even
> log does.  It is only because I didn't know about %aN 10 years ago
> that my custom log format does not.
>
> It's a pity the format author has the option to ignore the mailmap. I
> think it's a choice commonly made by mistake rather than intention.  I
> wonder if anyone would mind a forced-override config.  Maybe a force
> flag in the .mailmap file itself.
>
>            <cto@company.xx>                       <cto@coompany.xx>
>            Other Author <other@author.xx>   nick2 <bugs@company.xx>
>            Alice Doe <alice.doe@myco.com>         <bob.doe@myco.co>  --force

Yeah I'm sure a lot of people who do %an really mean %aN, but blanket
forcing it seems a recipe for breakage since "log" and friends are also
used as plumbing where you really mean "what does it say in this commit
object".

E.g. I use %an intentionally for a company-internal tool to map an Alice
to Bob for reporting purposes, which presumably you'd also want.

But yeah, there'll be other uses that didn't intend it. I think probably
the best way forward is to just make git use %aN by default in
porcelain, and outside users presumably would get reports about such
issues eventually in cases like this where someone cared.

>> This topic was discussed at the last git contributor summit (brought up
>> by CB Bailey) resulting in this patch, which I see didn't make it in &
>> needs to be resurrected again:
>> https://public-inbox.org/git/20181212171052.13415-1-cb@hashpling.org/
>
> Thanks for the link.
>
> I didn't know about config options for mailmap.file and log.mailmap
> before. These do make this option much more useful, especially when we
> can insert default settings for them into /etc/gitconfig across the
> company.

Right, and to the extent that we don't --use-mailmap by default I think
that's mainly because nobody's cared enough to advocate for it. I think
it would be a sensible default.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Deadname rewriting
  2019-06-21 21:34     ` Ævar Arnfjörð Bjarmason
@ 2019-06-21 22:16       ` CB Bailey
  0 siblings, 0 replies; 8+ messages in thread
From: CB Bailey @ 2019-06-21 22:16 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Phil Hord, Git

On Fri, Jun 21, 2019 at 11:34:06PM +0200, Ævar Arnfjörð Bjarmason wrote:
> >> This topic was discussed at the last git contributor summit (brought up
> >> by CB Bailey) resulting in this patch, which I see didn't make it in &
> >> needs to be resurrected again:
> >> https://public-inbox.org/git/20181212171052.13415-1-cb@hashpling.org/
> >
> > Thanks for the link.
> >
> > I didn't know about config options for mailmap.file and log.mailmap
> > before. These do make this option much more useful, especially when we
> > can insert default settings for them into /etc/gitconfig across the
> > company.
> 
> Right, and to the extent that we don't --use-mailmap by default I think
> that's mainly because nobody's cared enough to advocate for it. I think
> it would be a sensible default.

That was this patch:

https://public-inbox.org/git/20181213120940.26477-1-cb@hashpling.org/

There were no objections so I was going to re-propose it but I haven't
got around to this for a number of reasons, many of which are not Git
related. Ideally, I wanted to fix all of the known issues with mailmap
such as some behaviors of shortlog fixed with the shortlog patch above.

I also noticed some more artifacts that I would like to be fixed. In
particular the RFC 822 style "trailers" should be rewritten by default.

Having something like this pop up is not likely to be acceptable in a
project which uses trailers:

commit abcd...
Author: Bob <bob@...>

    important commit message

    Signed-off-by: Alice <alice@...?

Obviously it's virtually impossible to account for everything such as
someone referencing Bob by their deadname in the free text body of a
historical commit.

CB

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-06-21 22:24 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-15  1:54 Deadname rewriting Phil Hord
2019-06-15  7:27 ` Andreas Schwab
2019-06-15  8:19 ` Ævar Arnfjörð Bjarmason
2019-06-17 21:21   ` Philip Oakley
2019-06-17 22:33     ` Ævar Arnfjörð Bjarmason
2019-06-21 21:12   ` Phil Hord
2019-06-21 21:34     ` Ævar Arnfjörð Bjarmason
2019-06-21 22:16       ` CB Bailey

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).