git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "brian m. carlson" <sandals@crustytoothpaste.net>
To: <git@vger.kernel.org>
Cc: "Jeff King" <peff@peff.net>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Phillip Wood" <phillip.wood123@gmail.com>
Subject: [PATCH v2 3/5] t4203: add failing test for case-sensitive local-parts and names
Date: Sun,  3 Jan 2021 21:18:47 +0000	[thread overview]
Message-ID: <20210103211849.2691287-4-sandals@crustytoothpaste.net> (raw)
In-Reply-To: <20210103211849.2691287-1-sandals@crustytoothpaste.net>

Currently, Git always looks up entries in the mailmap in a
case-insensitive way, both for names and addresses, which is, as
explained below, suboptimal.

First, for email addresses, RFC 5321 is clear that only domains are case
insensitive; local-parts (the portion before the at sign) are not.  It
states this:

  The local-part of a mailbox MUST BE treated as case sensitive.
  Therefore, SMTP implementations MUST take care to preserve the case of
  mailbox local-parts.

There exist systems today where local-parts remain case sensitive (and
this author has one), and as such, it's incorrect for us to case fold
them in any way.  Let's add a failing test that indicates this is a
problem, while still keeping the test for case-insensitive domains.

Note that it's also incorrect for us to case-fold names because we don't
guarantee that we're using the locale of the author, and it's impossible
to case-fold names in a locale-insensitive way.  Turkish and Azeri
contain both a dotted and dotless I, and the uppercase ASCII I folds not
to the lowercase ASCII I, but to a dotless version, and vice versa with
the lowercase I.  There are many words in Turkish which differ only in
the dottedness of the I, so it is likely that there are also personal
names which differ in the same way.

That would be a problem even if our implementation were perfect, which
it is not.  We currently fold only ASCII characters, so this feature has
never worked correctly for the vast majority of the users on the planet,
regardless of the locale.  For example, on Linux, even in a Spanish
locale, we don't handle "Simón" properly.  Even if we did handle that,
we'd probably also want to implement Unicode normalization, which we
don't.

In general, case-folding text is extremely language- and locale-specific
and requires intimacy with the spelling and grammar of the language in
question and careful attention to the Unicode details in order to
produce a result that is meaningful to humans and conforms with
linguistic and societal norms.

Because we do not have any of the required context with a plain personal
name, we cannot hope to possibly case-fold personal names correctly.  We
should stop trying to do so and just treat them as a series of bytes, so
let's add a test that we don't case-fold personal names as well.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/t4203-mailmap.sh | 29 +++++++++++++++++++++++++++--
 1 file changed, 27 insertions(+), 2 deletions(-)

diff --git a/t/t4203-mailmap.sh b/t/t4203-mailmap.sh
index 586c3a86b1..32e849504c 100755
--- a/t/t4203-mailmap.sh
+++ b/t/t4203-mailmap.sh
@@ -170,10 +170,35 @@ Repo Guy (1):
 
 EOF
 
-test_expect_success 'name entry after email entry, case-insensitive' '
+test_expect_success 'name entry after email entry, case-insensitive domain' '
 	mkdir -p internal_mailmap &&
 	echo "<bugs@company.xy> <bugs@company.xx>" >internal_mailmap/.mailmap &&
-	echo "Internal Guy <BUGS@Company.xx>" >>internal_mailmap/.mailmap &&
+	echo "Internal Guy <bugs@Company.xx>" >>internal_mailmap/.mailmap &&
+	git shortlog HEAD >actual &&
+	test_cmp expect actual
+'
+
+cat >expect <<\EOF
+Repo Guy (1):
+      initial
+
+nick1 (1):
+      second
+
+EOF
+
+test_expect_failure 'name entry after email entry, case-sensitive local-part' '
+	mkdir -p internal_mailmap &&
+	echo "<bugs@company.xy> <bugs@company.xx>" >internal_mailmap/.mailmap &&
+	echo "Internal Guy <BUGS@company.xx>" >>internal_mailmap/.mailmap &&
+	git shortlog HEAD >actual &&
+	test_cmp expect actual
+'
+
+test_expect_failure 'name entry after email entry, case-sensitive personal name' '
+	mkdir -p internal_mailmap &&
+	echo "<bugs@company.xy> <bugs@company.xx>" >internal_mailmap/.mailmap &&
+	echo "Nick1 <bugs@company.xx> NICK1 <bugs@company.xx>" >internal_mailmap/.mailmap &&
 	git shortlog HEAD >actual &&
 	test_cmp expect actual
 '

  parent reply	other threads:[~2021-01-03 21:21 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-03 21:18 [PATCH v2 0/5] Hashed mailmap brian m. carlson
2021-01-03 21:18 ` [PATCH v2 1/5] mailmap: add a function to inspect the number of entries brian m. carlson
2021-01-04 15:14   ` Ævar Arnfjörð Bjarmason
2021-01-04 17:04   ` René Scharfe
2021-01-03 21:18 ` [PATCH v2 2/5] mailmap: switch to opaque struct brian m. carlson
2021-01-04 15:17   ` Ævar Arnfjörð Bjarmason
2021-01-03 21:18 ` brian m. carlson [this message]
2021-01-03 21:18 ` [PATCH v2 4/5] mailmap: use case-sensitive comparisons for local-parts and names brian m. carlson
2021-01-04 16:10   ` Ævar Arnfjörð Bjarmason
2021-01-06  0:46     ` Junio C Hamano
2021-01-12 14:08       ` Ævar Arnfjörð Bjarmason
2021-01-03 21:18 ` [PATCH v2 5/5] mailmap: support hashed entries in mailmaps brian m. carlson
2021-01-05 14:21   ` Ævar Arnfjörð Bjarmason
2021-01-06  0:24     ` brian m. carlson
2021-01-10 19:24       ` Ævar Arnfjörð Bjarmason
2021-01-10 21:26         ` brian m. carlson
2021-01-05 20:05   ` Junio C Hamano
2021-01-06  0:28     ` brian m. carlson
2021-01-06  1:50       ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210103211849.2691287-4-sandals@crustytoothpaste.net \
    --to=sandals@crustytoothpaste.net \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    --cc=phillip.wood123@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).