git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [RFC PATCH 0/2] Opaque author and committer identifiers
@ 2022-09-19 14:52 brian m. carlson
  2022-09-19 14:52 ` [RFC PATCH 1/2] doc: specify a header for including arbitrary format-patch metadata brian m. carlson
  2022-09-19 14:52 ` [RFC PATCH 2/2] docs: document a format for anonymous author and committer IDs brian m. carlson
  0 siblings, 2 replies; 7+ messages in thread
From: brian m. carlson @ 2022-09-19 14:52 UTC (permalink / raw)
  To: git
  Cc: Taylor Blau, Ævar Arnfjörð Bjarmason,
	Emily Shaffer, Johannes Schindelin

There have been frequent discussions on the list about the mailmap and
how it's not currently the ideal way to map former identities to current
identities.  This is especially true for transgender people, who often
don't want to associate their deadname with their current name in plain
text.

This is an RFC series that I talked about to some folks at Git Merge.
Roughly, this documents a format for an opaque identifier which is
compatible with existing implementations by overloading the email
address field with something that is not a real email address and cannot
be confused with one.  This opaque identifier is the fingerprint of some
key, which in most cases will be an SSH key.

It also proposes moving the mailmap out of the main history and into a
special ref for this purpose.  Notably, so as not to make the same
mistake we did with grafts, where they are not pushed by default and so
nobody uses them, the proposal here is to change tooling so that the
mailmap refs are easy to push and pull and that this is done by default
(with an easy way to opt-out).  By default, local changes to the mailmap
ref are squashed into the current commit such that there is only one
commit on the ref.  This preserves the existing mapping while not
retaining former identities, which we don't really need. (Who wants to
send email to a contributor's address at a former employer which doesn't
work anymore?)

Since this series needs a way to cart mailmap information around in
patches and I would not like to repeat the same design as base-commit,
I've proposed a separate header to include this information around.  I'm
not terribly attached to this proposal and am open to other ideas if
folks like them better, but I feel it moves us in a useful direction to
being able to include other metadata in a structured way and to sending
signed commits by patch, which other folks wanted to do (and I am in
favour of).  (I'm willing to implement such a feature based on this
approach in the future if folks desire.)

All of these changes will be optional to adopt.  Projects need not use
them if they don't want to.  However, I am proposing that they be
advertised prominently as a preferred option (for example, in the "Tell
me who you are" message) to encourage adoption.  Appropriate tooling
will be included to make this easy.

In addition, besides the general benefits for trans folks and the
ability to operate anonymously or pseudonymously, I also think using an
opaque identifier will cut down on spam.  I have received many unwelcome
solicitations from employers and survey-toting academics to my email
address, as I'm sure others have.  Receiving fewer of these in the
future will be a nice bonus.

For those folks using forges, it should be noted that associating an
identifier with an account should be very easy, since the forge usually
has SSH key support and commit verification and thus, the user's keys,
so there's no change to workflow on forges once they implement this
feature.  For those forges which use the user's personal name in the UI,
this can simply be replaced by the personal name the user has registered
with the forge.

None of this deals with rewriting identities in existing commits.  We
have what we have now and can't change it, but we can do something
different going forward.  If there is interest in the hashed mailmap
approach or another similar approach, I'm open to resurrecting that in
addition provided we agree as a project not to write tools which
trivially invert the hashed mailmap (which was the reason I dropped that
series in the first place).

I realize this is a radical departure from what we've done historically,
so this is an RFC series.  It's to gauge interest in this proposal and
design and to discuss alternatives before implementation. If we like
this approach, I will agree to implement it as my time allows, which I
expect could be done in a single series of under 30 patches.

I've CC'd some of the folks I talked to about this and some folks who I
think might be interested, but of course any constructive feedback is
welcome.

brian m. carlson (2):
  doc: specify a header for including arbitrary format-patch metadata
  docs: document a format for anonymous author and committer IDs

 Documentation/technical/anonymous-id.txt      | 143 ++++++++++++++++++
 .../technical/format-patch-metadata.txt       |  58 +++++++
 2 files changed, 201 insertions(+)
 create mode 100644 Documentation/technical/anonymous-id.txt
 create mode 100644 Documentation/technical/format-patch-metadata.txt


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [RFC PATCH 1/2] doc: specify a header for including arbitrary format-patch metadata
  2022-09-19 14:52 [RFC PATCH 0/2] Opaque author and committer identifiers brian m. carlson
@ 2022-09-19 14:52 ` brian m. carlson
  2022-09-19 14:52 ` [RFC PATCH 2/2] docs: document a format for anonymous author and committer IDs brian m. carlson
  1 sibling, 0 replies; 7+ messages in thread
From: brian m. carlson @ 2022-09-19 14:52 UTC (permalink / raw)
  To: git
  Cc: Taylor Blau, Ævar Arnfjörð Bjarmason,
	Emily Shaffer, Johannes Schindelin

Right now, we lack a way to specify arbitrary metadata for format-patch.
We currently special-case the base-commit value, but this is not helpful
in the general case.  There has also been interest in specifying
signatures for transport between machines using mailing list patches.

In a future commit, we will define a format for the author and committer
data such that the email represents an opaque ID instead of an email.
As a practical matter, this makes it difficult to send patches, since
many mail servers will not accept arbitrary From lines.  Even using
in-body From headers is not suitable here because we will want to
include entries in the mailmap out-of-band as part of the patch.

To make this case more general and allow us to specify this information
in a more general way, let's add a metadata header which can be included
in the patch and allow specifying arbitrary values that we can then fill
in.  We explicitly specify an extension mechanism to allow others to use
this data in a time-tested way that avoids conflicts.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 .../technical/format-patch-metadata.txt       | 55 +++++++++++++++++++
 1 file changed, 55 insertions(+)
 create mode 100644 Documentation/technical/format-patch-metadata.txt

diff --git a/Documentation/technical/format-patch-metadata.txt b/Documentation/technical/format-patch-metadata.txt
new file mode 100644
index 0000000000..5448918da9
--- /dev/null
+++ b/Documentation/technical/format-patch-metadata.txt
@@ -0,0 +1,55 @@
+format-patch Metadata
+=====================
+
+Background
+----------
+
+The current format-patch data lacks a way to express general metadata that may
+be useful to synthesize the original commit more accurately.  This may be
+helpful to emit patches as a transport for actual commits between machines in a
+case where bundles are not practical, such as a mailing list.
+
+Syntax
+------
+
+The syntax contains three space-separated components: a field name, an encoding,
+and field data.
+
+The field name contains no spaces.  Values without an `@` are specified below or
+by a future version of Git.  Values containing an `@` followed by a domain are
+specified by that domain owner, much as algorithm names in the SSH protocol.
+
+The encoding is either `plain`, in which case the field data is a literal string
+with no spaces, or `base64`, in which case the field data is one or more
+space-separated base64 items, which when interpreted have all spaces stripped
+and are then encoded.  This allows fields to be specified that contain octets
+which are not valid in or which are too long to specified in an RFC 5322 header
+unencoded.
+
+Fields
+------
+
+base-commit-sha1::
+	This specifies the base commit for this patch using a SHA-1 object ID.
+base-commit-sha256::
+	This specifies the base commit for this patch using a SHA-256 object ID.
+	appropriate.
+gpgsig-sha1::
+	This specifies a signature for this patch using the SHA-1 format, as specified
+	in the `gpgsig` header.
+gpgsig-sha256::
+	This specifies the base commit for this patch using the SHA-256 object ID, as
+	specified in the `gpgsig-sha256` header.
+
+Examples
+--------
+
+----
+X-Git-Metadata: base-commit-sha1 plain da39a3ee5e6b4b0d3255bfef95601890afd80709
+
+X-Git-Metadata: gpgsig-sha256 base64
+  LS0tLS1CRUdJTiBTU0ggU0lHTkFUVVJFLS0tLS0KYmxhaCBibGFoIGJsYWgKLS0tLS1FTkQgU1NI
+  IFNJR05BVFVSRS0tLS0tCg==
+
+X-Git-Metadata: foo@example.com plain quux
+----

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFC PATCH 2/2] docs: document a format for anonymous author and committer IDs
  2022-09-19 14:52 [RFC PATCH 0/2] Opaque author and committer identifiers brian m. carlson
  2022-09-19 14:52 ` [RFC PATCH 1/2] doc: specify a header for including arbitrary format-patch metadata brian m. carlson
@ 2022-09-19 14:52 ` brian m. carlson
  2022-09-20 10:51   ` Ævar Arnfjörð Bjarmason
  2022-09-30 20:26   ` Gwyneth Morgan
  1 sibling, 2 replies; 7+ messages in thread
From: brian m. carlson @ 2022-09-19 14:52 UTC (permalink / raw)
  To: git
  Cc: Taylor Blau, Ævar Arnfjörð Bjarmason,
	Emily Shaffer, Johannes Schindelin

The original design of Git embeds a personal name and email in every
commit.  This has lots of downsides, including the following.

First, people do not want to bake an email into an immutable Merkle tree
that they send everywhere.  Spam, whether in general or by recruiters,
is a problem, and even when it's not, people change companies or
institutions and emails become invalid.

Second, some people prefer to operate anonymously and don't want to
specify personal details everywhere.

Third, and most important, people change names.  This happens for many
reasons, but it comes up most saliently for transgender people, who
frequently change their name as part of their transition.  Referring to
a transgender person's former name, their "deadname", is considered
inappropriate.

We have a solution that can map former personal names and emails into
current ones, the mailmap.  However, this last case poses a problem,
because we don't really want to correlate the person's deadname (or
their email, which may contain their deadname) right next to their
current name.

Several solutions have been proposed for this case, including hashing or
encoding the old information, but these are all easily invertible.
Instead, let's propose a new form of identifier which is opaque and some
mailmap improvements to store the mailmap information outside of the
main history.

Propose that users use the fingerprint of a cryptographic key as part of
a special-form email which is not valid according to RFC 1123, but is
accepted by earlier versions of Git.  Now that we have SSH signing and
OpenSSH is available on all major platforms, creating a unique ID is as
easy as running ssh-keygen.  This approach results in an identifier
which is unique, deterministic, and completely anonymous.

Propose this new option instead of using a name and email, although
users can continue to use those as before if they prefer. Continue to
associate personal information with this opaque identifier using the
mailmap, but in such a way that it lives in a special ref outside of the
history and that ref is customarily kept squashed to a single commit.
Create a special RFC 5322 header to associate a mailmap entry with the
user's opaque identifier when sending a patch if desired.

Because the mailmap now lives outside the history in a single squashed
commit, a user may simply update their name by sending a new patch with
the same opaque ID, or proposing a change to the mailmap independently.
A person's former name or email address is not retained in the history
(unless the project chooses to do that for the mailmap ref).

Since many people use forges for hosting their code and forges offer
commit verification and SSH access, it is extremely easy for a forge to
associate a commit with this new opaque identifier with a user, since
they probably already have this information.  Thus, for projects which
use solely a forge-based development workflow, no mailmap entry need
even be created unless one is desired.  If one is desired, it may be
able to be created and updated automatically as part of the forge's
normal infrastructure simply upon sending a patch.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 Documentation/technical/anonymous-id.txt      | 143 ++++++++++++++++++
 .../technical/format-patch-metadata.txt       |   3 +
 2 files changed, 146 insertions(+)
 create mode 100644 Documentation/technical/anonymous-id.txt

diff --git a/Documentation/technical/anonymous-id.txt b/Documentation/technical/anonymous-id.txt
new file mode 100644
index 0000000000..aeba5e68f2
--- /dev/null
+++ b/Documentation/technical/anonymous-id.txt
@@ -0,0 +1,143 @@
+Anonymous IDs
+=============
+
+Objective
+---------
+
+Provide a way for people to identify themselves without the need to associate a
+fixed personal name or email.
+
+Background
+----------
+
+People change their name and email many times over the course over their lives.
+For example, people may marry or change jobs.  In many cases, these changes can
+be handled by the mailmap.  However, for many transgender people, keeping the
+old name in the mailmap is often undesirable.
+
+This document proposes a new way to specify anonymous IDs based on an SSH key or
+GnuPG key instead along with a mailmap which is automatically downloaded from
+the remote which provides an automatic correspondence.  In this approach, all
+users are expected to specify an anonymous ID and a mailmap entry.
+
+This does not solve the problem of previous commits, but it does solve the
+approach going forward if reasonably well adopted and avoids the problems of
+existing approaches of obscuring the mailmap which are defeated by simply
+enumerating all entries in all commits.
+
+Anonymous IDs
+-------------
+
+Git will implement a new form of email address which is acceptable to existing
+implementations but is not valid according to RFC 1123.  This takes the form of
+an email address where the local-part contains the identifier and the domain
+portion starts with `_.` and then a domain specifier which specifies an
+authority and the meaning of the identifier.
+
+In such a case, Git will specify the username as a single U+2060 in UTF-8 (the
+byte sequence 0xE2 0x81 0xA0), which is a zero width non-breaking space.  This
+is compatible with existing implementations.
+
+The Git project will specify a set of identifiers under the domain
+`id.git-scm.com`.  The next component is the type of key as specified by the
+`gpg.program` identifier, and then a component indicating the hash type or
+version number as specified below.
+
+This approach provides IDs which are simple and easy to create (almost all users
+will have an SSH implementation which can generate keys with a single command),
+opaque, completely deterministic, and not personally identifiable.
+
+Other authorities, such as hosting providers, may use different IDs.  For
+example, if the hosting provider example.com might issue the ID
+`1234@_.user.example.com` for user ID 1234.  Authorities are encouraged to use
+database IDs or other unique IDs rather than usernames, since many usernames
+contain human names or corporate affiliations, which defeats the point of this
+feature.
+
+In conjunction with a single, constantly rewritten mailmap reference and
+`mailmap.blob`, this allows users to move their real IDs outside of the commit
+IDs into a mailmap which is constantly rewritten.  If a user's real name or
+email changes, they can submit an update to the mailmap and the ID, which will
+be squashed into a single commit without history.
+
+Specifications
+~~~~~~~~~~~~~~
+
+OpenPGP Keys
+^^^^^^^^^^^^
+
+If a user possesses a v4 OpenPGP key, then they may use the domain
+`_.v4.openpgp.id.git-scm.com` using a lowercase hex form of the SHA-1
+fingerprint as the local-part.  For example, the key with the fingerprint
+`da39a3ee5e6b4b0d3255bfef95601890afd80709` would have the email address
+`da39a3ee5e6b4b0d3255bfef95601890afd80709@_v4.openpgp.id.git-scm.com`.
+
+Similarly, when RFC 4880 bis is implemented using v5 keys with SHA-256
+fingerprints, the domain `_.v5.openpgp.id.git-scm.com` may be used with a
+lowercase hex form of the SHA-256 fingerprint as the local-part.
+
+SSH Keys
+^^^^^^^^
+
+If a user possesses an SSH key, then they may use the domain
+`_.sha256.ssh.id.git-scm.com` using a base64url encoding (without padding) as
+the local-part.  This is the RFC 4648 Base64 encoding with URL and filename safe
+alphabet without the padding character.  For example, a user whose SSH key
+fingerprint is `47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU` may use
+`47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU@_.sha256.ssh.id.git-scm.com`.
+
+It's intentional that no specification is provided for MD5 fingerprints.  MD5 is
+obsolete and should not be used in new protocols such as this.
+
+X.509 Certificates
+^^^^^^^^^^^^^^^^^^
+
+If a user possesses an X.509 certificate, then they may use the domain
+`_.sha256.x509.id.git-scm.com` using a lowercase hex form of the SHA-256
+fingerpint of the certificate.  For example, if the key fingerprint is
+`e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855`, then the ID
+would be
+`e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855@_.sha256.x509.id.git-scm.com`.
+
+Emission
+~~~~~~~~
+
+A user may specify, instead of `user.email`, a `user.signingkey` (or a suitable
+protocol-specific setting).  If `user.idFormat` is set to `email`, then the
+user's email will be written into the commit; if it is instead set to `key`,
+then the ID corresponding to the key is extracted from the signing program and
+that is used instead.  `id` can be used to specify the `user.id` value. An order
+of items to try can be specifed with a colon-separated list.  The default, which
+is subject to change, is `id:email:key`.  This allows users to specify an
+independent ID which is independent of their email.
+
+For patches, a user may specify `format.id` as `as-is` to leave the data as is,
+or as `mailmap` to use the mailmap value to rewrite it to the value in the
+mailmap. If the user specifies `mailmap-metadata`, then an in-body `From:` line
+in the patch is written to contain the author ID using the ID as written in the
+commit, but a format-patch metadata header is written using the mailmap entry in
+the commit.
+
+Expected Mailmap Improvements
+-----------------------------
+
+Right now, the mailmap is included in a repository as part of a regular commit.
+This means it has a history, which is undesirable if the user would like to
+completely rewrite their identity.
+
+This can be easily solved with some mailmap improvements.  `git clone` will
+learn a command, `--use-mailmap`, which will specifically fetch the ref
+`refs/mailmap` from the remote and keep it up to date using force updates if
+necessary.  This option will also specify `mailmap.blob` to point to the
+`.mailmap` file in this ref, which allows the user to automatically keep it up
+to date with the remote.
+
+`git am` or `git apply` can then apply the mailmap entry from the patch to the
+appropriate ref with `--use-mailmap`.  The default is `--use-mailmap=amend`,
+which amends the existing commit.   If a user would like to preserve a history
+for some reason, they can use `--use-mailmap=commit`.  For maintainers, they can
+then push this ref using the normal push refspecs, or explicitly with
+`--mailmap`, which is equivalent to `+refs/mailmap:refs/mailmap`.
+
+The goal of this is to make interacting with the mailmap refs automatic and
+transparent whenever other data is fetched or cloned from the remote.
diff --git a/Documentation/technical/format-patch-metadata.txt b/Documentation/technical/format-patch-metadata.txt
index 5448918da9..87e301b65e 100644
--- a/Documentation/technical/format-patch-metadata.txt
+++ b/Documentation/technical/format-patch-metadata.txt
@@ -40,6 +40,9 @@ gpgsig-sha1::
 gpgsig-sha256::
 	This specifies the base commit for this patch using the SHA-256 object ID, as
 	specified in the `gpgsig-sha256` header.
+mailmap-author::
+	This specifies the mailmap entry to associate with the email address or other
+	identifier in the `From:` header.
 
 Examples
 --------

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH 2/2] docs: document a format for anonymous author and committer IDs
  2022-09-19 14:52 ` [RFC PATCH 2/2] docs: document a format for anonymous author and committer IDs brian m. carlson
@ 2022-09-20 10:51   ` Ævar Arnfjörð Bjarmason
  2022-09-22  0:08     ` brian m. carlson
  2022-09-30 20:26   ` Gwyneth Morgan
  1 sibling, 1 reply; 7+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-09-20 10:51 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, Taylor Blau, Emily Shaffer, Johannes Schindelin


On Mon, Sep 19 2022, brian m. carlson wrote:

> The original design of Git embeds a personal name and email in every
> commit.  This has lots of downsides, including the following.
>
> First, people do not want to bake an email into an immutable Merkle tree
> that they send everywhere.  Spam, whether in general or by recruiters,
> is a problem, and even when it's not, people change companies or
> institutions and emails become invalid.
>
> Second, some people prefer to operate anonymously and don't want to
> specify personal details everywhere.
>
> Third, and most important, people change names.  This happens for many
> reasons, but it comes up most saliently for transgender people, who
> frequently change their name as part of their transition.  Referring to
> a transgender person's former name, their "deadname", is considered
> inappropriate.
>
> We have a solution that can map former personal names and emails into
> current ones, the mailmap.  However, this last case poses a problem,
> because we don't really want to correlate the person's deadname (or
> their email, which may contain their deadname) right next to their
> current name.
>
> Several solutions have been proposed for this case, including hashing or
> encoding the old information, but these are all easily invertible.
> Instead, let's propose a new form of identifier which is opaque and some
> mailmap improvements to store the mailmap information outside of the
> main history.

With you so far...

> Propose that users use the fingerprint of a cryptographic key as part of
> a special-form email which is not valid according to RFC 1123, but is
> accepted by earlier versions of Git.  Now that we have SSH signing and
> OpenSSH is available on all major platforms, creating a unique ID is as
> easy as running ssh-keygen.  This approach results in an identifier
> which is unique, deterministic, and completely anonymous.

...but...

> Propose this new option instead of using a name and email, although
> users can continue to use those as before if they prefer. Continue to
> associate personal information with this opaque identifier using the
> mailmap, but in such a way that it lives in a special ref outside of the
> history and that ref is customarily kept squashed to a single commit.
> Create a special RFC 5322 header to associate a mailmap entry with the
> user's opaque identifier when sending a patch if desired.

...while it's technically neat, I really don't see why this whole
hashing mechanism is a necessary prerequisite to get to this point.

Wouldn't we get the same thing if *by convention* we just supported
authorship like this, (which we already support):

	UUID=$(get-some-uuid)
        git config user.name X
        git config user.email $UUID.uuid.git.example.org

So you'd end up with e.g.:

	X <98ab8d66-38d2-11ed-a261-0242ac120002.uuid.git.example.com>

Or whatever, we could bikeshed about the format, but the point is that
it's not codifying *how* that looks.

We'd then just support this refs/mailmap mechanism you're suggesting,
where we'd have a mapping like:

      Ævar Arnfjörð Bjarmason <avarab@gmail.com> X <98ab8d66-38d2-11ed-a261-0242ac120002.uuid.git.example.com>

Which could be force-pushed.

I can see why you'd *also* want to formalize the ID generation, but I
just don't see why we'd want to make that as one leaping change rather
than something more incremental.

I.e. even if you don't have opaque IDs in the first place this mechanism
would allow you to maintain a "mailmap" ref on the remote, which would
already be useful.

E.g. now if I use a hosting provider and have my .mailmap in various
repo I need to maintain then in each repo, but this would allow for a
magical ref which would keep it up-to-date in various repos...

> [...]If a user would like to preserve a history
> +for some reason, they can use `--use-mailmap=commit`.  For maintainers, they can
> +then push this ref using the normal push refspecs, or explicitly with
> +`--mailmap`, which is equivalent to `+refs/mailmap:refs/mailmap`.

I obviously see why you want the "force push" aspect of this (the
deadnaming), but I still wonder if it's really a good trade-off for git
as an SCM to make that the default.

We've been going in the other direction for e.g. tags semi-recently with
my 0bc8d71b99e (fetch: stop clobbering existing tags without --force,
2018-08-31).

By having that force-push default we make it so that a plumbing command
(that makes use of mailmap) will give you one result today, but a
different one tomorrow, with no easy way to get back.

Maybe it's something we want in the end, but it's another thing that's
"changed while at it", i.e. not only are we introducing "mailmap" remote
refs, but also:

 * Changing the many-to-many mapping of history-mailmap to a
   many-to-one, i.e. the map is per-repo, not per-ref.

 * Changing it so that you can't track is as part of your history.

If we wanted to ease into just one of those we could have a "mailmap"
tag object, which we wouldn't clobber by default....


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH 2/2] docs: document a format for anonymous author and committer IDs
  2022-09-20 10:51   ` Ævar Arnfjörð Bjarmason
@ 2022-09-22  0:08     ` brian m. carlson
  0 siblings, 0 replies; 7+ messages in thread
From: brian m. carlson @ 2022-09-22  0:08 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Taylor Blau, Emily Shaffer, Johannes Schindelin

[-- Attachment #1: Type: text/plain, Size: 4653 bytes --]

On 2022-09-20 at 10:51:39, Ævar Arnfjörð Bjarmason wrote:
> Wouldn't we get the same thing if *by convention* we just supported
> authorship like this, (which we already support):
> 
> 	UUID=$(get-some-uuid)
>         git config user.name X
>         git config user.email $UUID.uuid.git.example.org

You can indeed use a UUID if you want.  However, it's not deterministic.

Using a key hash also means account linking is trivially implemented in
forges.  If we use a UUID, then there's no way to prove ownership of the
identifier, which means that people can claim other people's commits.
Signed commits don't help here because you can't embed arbitrary
non-emails in X.509 (or in OpenPGP, because nobody will certify such an
ID), so you have no way of linking the commit identity to the key and
therefore signed commits are worse than before.  At least with an email
you can verify that the owner of the account owns the email address, but
you can't do that with a UUID.

I want a design that works whether or not you use a forge, but realizing
that most developers use forges these days, I want to make the workflow as
simple and straightforward as possible for those who do.  I also want a
design which is going to be acceptable to forge implementers, and
working for one, I think this design is going to be easier to implement
and more likely to be accepted than an ID which requires extra work and
isn't verifiable.

For ease of use, I would be implementing tooling to make setting this
from an existing user.signingkey or SSH key on the system.  I literally
envision this being as simple as something like `git id --set -f
~/.ssh/id_ed25519` or `git id --set --generate-ssh-key`.  (This is just
an example; we can argue about the details later.)

> So you'd end up with e.g.:
> 
> 	X <98ab8d66-38d2-11ed-a261-0242ac120002.uuid.git.example.com>
> 
> Or whatever, we could bikeshed about the format, but the point is that
> it's not codifying *how* that looks.

I do very much want to codify how this looks because people are
absolutely going to rely on it, whether we want them to or not.  People
already parse GitHub's fake no-reply emails for information.  Everything
that Git does people rely on, whether we like it or not.

Keeping it in the form of an email maximizes compatibility for existing
implementations.

> We'd then just support this refs/mailmap mechanism you're suggesting,
> where we'd have a mapping like:
> 
>       Ævar Arnfjörð Bjarmason <avarab@gmail.com> X <98ab8d66-38d2-11ed-a261-0242ac120002.uuid.git.example.com>
> 
> Which could be force-pushed.
> 
> I can see why you'd *also* want to formalize the ID generation, but I
> just don't see why we'd want to make that as one leaping change rather
> than something more incremental.

We can make it as incremental as folks want.  However, the longer we
have people embedding their real names and emails in an immutable Merkle
tree, the longer we're going to run into deadname problems.  Thus,
encouraging this new form of ID sooner means that people will adopt it
sooner.

If this is the only impediment, we can make it more gradual.

> I.e. even if you don't have opaque IDs in the first place this mechanism
> would allow you to maintain a "mailmap" ref on the remote, which would
> already be useful.
> 
> E.g. now if I use a hosting provider and have my .mailmap in various
> repo I need to maintain then in each repo, but this would allow for a
> magical ref which would keep it up-to-date in various repos...

That's part of the goal.

> I obviously see why you want the "force push" aspect of this (the
> deadnaming), but I still wonder if it's really a good trade-off for git
> as an SCM to make that the default.
> 
> We've been going in the other direction for e.g. tags semi-recently with
> my 0bc8d71b99e (fetch: stop clobbering existing tags without --force,
> 2018-08-31).
> 
> By having that force-push default we make it so that a plumbing command
> (that makes use of mailmap) will give you one result today, but a
> different one tomorrow, with no easy way to get back.

I think force-pushing semantics has a nicer behaviour for my use case,
but it's not essential.  If the mailmap is in a separate ref, then if I
work at $MEGACORP and need to update the mailmap because of a name
change, I can still just rewrite the history, and as long as we preserve
the force-fetch behaviour by default, then it will just work.

I _do_ think we should retain the force-fetch behaviour by default.
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH 2/2] docs: document a format for anonymous author and committer IDs
  2022-09-19 14:52 ` [RFC PATCH 2/2] docs: document a format for anonymous author and committer IDs brian m. carlson
  2022-09-20 10:51   ` Ævar Arnfjörð Bjarmason
@ 2022-09-30 20:26   ` Gwyneth Morgan
  2022-10-02  0:27     ` brian m. carlson
  1 sibling, 1 reply; 7+ messages in thread
From: Gwyneth Morgan @ 2022-09-30 20:26 UTC (permalink / raw)
  To: brian m. carlson
  Cc: git, Taylor Blau, Ævar Arnfjörð Bjarmason,
	Emily Shaffer, Johannes Schindelin

In general, I like this proposal. It seems like a good way forward.

It should be made very clear to the user that a commit authored by a
key-derived ID does not imply the commit is signed by that key or
provide any security guarantees; anyone can put anything in that field,
same as it is now. I could see someone seeing a commit authored by
<47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU@_.sha256.ssh.id.git-scm.com>
and thinking that implies the commit was signed by
`47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU`.

On 2022-09-19 14:52:31+0000, brian m. carlson wrote:
> +Anonymous IDs
> +-------------
> +
> +Git will implement a new form of email address which is acceptable to existing
> +implementations but is not valid according to RFC 1123.  This takes the form of
> +an email address where the local-part contains the identifier and the domain
> +portion starts with `_.` and then a domain specifier which specifies an
> +authority and the meaning of the identifier.
> +
> +In such a case, Git will specify the username as a single U+2060 in UTF-8 (the
> +byte sequence 0xE2 0x81 0xA0), which is a zero width non-breaking space.  This
> +is compatible with existing implementations.

Could you add a note here explaining why that character was chosen for
the name field? It seems like it would be easier to work with a single
printable character like `?` or `X`, but maybe that doesn't matter here.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH 2/2] docs: document a format for anonymous author and committer IDs
  2022-09-30 20:26   ` Gwyneth Morgan
@ 2022-10-02  0:27     ` brian m. carlson
  0 siblings, 0 replies; 7+ messages in thread
From: brian m. carlson @ 2022-10-02  0:27 UTC (permalink / raw)
  To: Gwyneth Morgan
  Cc: git, Taylor Blau, Ævar Arnfjörð Bjarmason,
	Emily Shaffer, Johannes Schindelin

[-- Attachment #1: Type: text/plain, Size: 2326 bytes --]

On 2022-09-30 at 20:26:41, Gwyneth Morgan wrote:
> In general, I like this proposal. It seems like a good way forward.
> 
> It should be made very clear to the user that a commit authored by a
> key-derived ID does not imply the commit is signed by that key or
> provide any security guarantees; anyone can put anything in that field,
> same as it is now. I could see someone seeing a commit authored by
> <47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU@_.sha256.ssh.id.git-scm.com>
> and thinking that implies the commit was signed by
> `47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU`.

Of course.  I'll update that when I turn this into a real series.

> On 2022-09-19 14:52:31+0000, brian m. carlson wrote:
> > +Anonymous IDs
> > +-------------
> > +
> > +Git will implement a new form of email address which is acceptable to existing
> > +implementations but is not valid according to RFC 1123.  This takes the form of
> > +an email address where the local-part contains the identifier and the domain
> > +portion starts with `_.` and then a domain specifier which specifies an
> > +authority and the meaning of the identifier.
> > +
> > +In such a case, Git will specify the username as a single U+2060 in UTF-8 (the
> > +byte sequence 0xE2 0x81 0xA0), which is a zero width non-breaking space.  This
> > +is compatible with existing implementations.
> 
> Could you add a note here explaining why that character was chosen for
> the name field? It seems like it would be easier to work with a single
> printable character like `?` or `X`, but maybe that doesn't matter here.

Sure, I'll include that there.  The author field cannot be empty for
compatibility reasons.  Since there's nothing to put there until it's
run through the mailmap, putting a single zero-width non-breaking space
produces the same rendering as nothing, and it doesn't require special
handling like "?" or "X". (Also, it should be noted that not all
languages use "?" as the question mark.)

Note that if this is mapped in the mailmap, you don't need to actually
put the personal name that exists in the commit.  The mailmap rewrites
based on the email address (or, in this case, the ID), so nobody ever
has to write the U+2060 in the mailmap.
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-10-02  0:27 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-19 14:52 [RFC PATCH 0/2] Opaque author and committer identifiers brian m. carlson
2022-09-19 14:52 ` [RFC PATCH 1/2] doc: specify a header for including arbitrary format-patch metadata brian m. carlson
2022-09-19 14:52 ` [RFC PATCH 2/2] docs: document a format for anonymous author and committer IDs brian m. carlson
2022-09-20 10:51   ` Ævar Arnfjörð Bjarmason
2022-09-22  0:08     ` brian m. carlson
2022-09-30 20:26   ` Gwyneth Morgan
2022-10-02  0:27     ` brian m. carlson

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).