git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Bug in .mailmap handling?
@ 2013-07-12 16:07 Stefan Beller
  2013-07-12 17:35 ` Junio C Hamano
  2013-07-12 20:28 ` Junio C Hamano
  0 siblings, 2 replies; 8+ messages in thread
From: Stefan Beller @ 2013-07-12 16:07 UTC (permalink / raw)
  To: git

Hello,

you may have noticed I am currently trying to bring the 
mailmap file of git itself up to date. I noticed
some behavior, which I did not expect. Have a look yourself:

---
	# prepare test environment:
	mkdir testmailmap
	cd testmailmap/
	git init

	# do a commit:
	echo "asdf" > test1 
	git add test1
	git commit -a --author="A <A@example.org>" -m "add test1"

	# commit with same name, but different email 
	# (different capitalization does the trick already, 
	# but here I am going to use a different mail)
	echo "asdf" > test2
	git add test2
	git commit -a --author="A <changed_email@example.org>" -m "add test2"

	# how do we know it's the same person?
	git shortlog
	A (2):
		  add test1
		  add test2

	# reports as expected:
	git shortlog -sne
		  1  A <A@example.org>
		  1  A <changed_email@example.org>
		  
	# Adding the line to the mailmap should make life easy, so we know
	# it's the same person
	echo "A <A@example.org> <changed_email@example.org>" > .mailmap

	# Come on, I just wanted to have it reported as one person!
	git shortlog -sne
		 1  A <A@example.org>
		 1  A <a@example.org>
		 
	# So let's try another line in the mailmap file, (small 'a')
	echo "A <a@example.org> <changed_email@example.org>" > .mailmap

	# We're not there yet?
	git shortlog -sne
		 1  A <A@example.org>
		 1  A <a@example.org>

	# Now let's write it rather explicit: 
	# (essentially just write 2 lines into the mailmap file)
	cat << EOF > .mailmap
	A <a@example.org> <changed_email@example.org>
	A <a@example.org> <A@example.org>
	EOF
		 
	# works as expected now
	git shortlog -sne
		 2  A <a@example.org>

	# works as expected now as well
	git shortlog      
	A (2):
		  add test1
		  add test2

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Bug in .mailmap handling?
  2013-07-12 16:07 Bug in .mailmap handling? Stefan Beller
@ 2013-07-12 17:35 ` Junio C Hamano
  2013-07-12 17:47   ` Stefan Beller
  2013-07-12 20:28 ` Junio C Hamano
  1 sibling, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2013-07-12 17:35 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

Stefan Beller <stefanbeller@googlemail.com> writes:

> Hello,
>
> you may have noticed I am currently trying to bring the 
> mailmap file of git itself up to date. I noticed
> some behavior, which I did not expect. Have a look yourself:
>
> ---
> 	# prepare test environment:
> 	mkdir testmailmap
> 	cd testmailmap/
> 	git init
>
> 	# do a commit:
> 	echo "asdf" > test1 
> 	git add test1
> 	git commit -a --author="A <A@example.org>" -m "add test1"
>
> 	# commit with same name, but different email 
> 	# (different capitalization does the trick already, 
> 	# but here I am going to use a different mail)
> 	echo "asdf" > test2
> 	git add test2
> 	git commit -a --author="A <changed_email@example.org>" -m "add test2"
>
> 	# how do we know it's the same person?
> 	git shortlog
> 	A (2):
> 		  add test1
> 		  add test2

You don't, and it is a long known behaviour.

> 	# reports as expected:
> 	git shortlog -sne
> 		  1  A <A@example.org>
> 		  1  A <changed_email@example.org>

Yes.

> 	# Adding the line to the mailmap should make life easy, so we know
> 	# it's the same person
> 	echo "A <A@example.org> <changed_email@example.org>" > .mailmap
>
> 	# Come on, I just wanted to have it reported as one person!
> 	git shortlog -sne
> 		 1  A <A@example.org>
> 		 1  A <a@example.org>

Err, where does the lowercase a@ come from in the above?  Are we
missing some steps before we get here?

> 	# So let's try another line in the mailmap file, (small 'a')
> 	echo "A <a@example.org> <changed_email@example.org>" > .mailmap

This is ">", not ">>", I presume?  Otherwise changed_email is mapped
to two destination, no?

> 	# We're not there yet?
> 	git shortlog -sne
> 		 1  A <A@example.org>
> 		 1  A <a@example.org>

Expected, as long as some hidden set-up you did not describe that
caused me to say "Err, where does the lowercase a@ come from" is
there, i.e. one of the two commits is done by <a@example.org>.

> 	# Now let's write it rather explicit: 
> 	# (essentially just write 2 lines into the mailmap file)
> 	cat << EOF > .mailmap
> 	A <a@example.org> <changed_email@example.org>
> 	A <a@example.org> <A@example.org>
> 	EOF
> 		 
> 	# works as expected now
> 	git shortlog -sne
> 		 2  A <a@example.org>

Makes sense.

> 	# works as expected now as well
> 	git shortlog      
> 	A (2):
> 		  add test1
> 		  add test2

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Bug in .mailmap handling?
  2013-07-12 17:35 ` Junio C Hamano
@ 2013-07-12 17:47   ` Stefan Beller
  0 siblings, 0 replies; 8+ messages in thread
From: Stefan Beller @ 2013-07-12 17:47 UTC (permalink / raw)
  To: Junio C Hamano, git

[-- Attachment #1: Type: text/plain, Size: 2947 bytes --]

The commands are exactly as given in the first mail.
(I run it multiple times, and this exact behavior comes up)

I think there is some problem with the capitalisation
of the first (if not all) characters. (Hence the small 'a')

So either check with these example commands yourself, or take my latest
patch for the mailmap to reproduce on real names, please.

On 07/12/2013 07:35 PM, Junio C Hamano wrote:
> Stefan Beller <stefanbeller@googlemail.com> writes:
> 
>> Hello,
>>
>> you may have noticed I am currently trying to bring the 
>> mailmap file of git itself up to date. I noticed
>> some behavior, which I did not expect. Have a look yourself:
>>
>> ---
>> 	# prepare test environment:
>> 	mkdir testmailmap
>> 	cd testmailmap/
>> 	git init
>>
>> 	# do a commit:
>> 	echo "asdf" > test1 
>> 	git add test1
>> 	git commit -a --author="A <A@example.org>" -m "add test1"
>>
>> 	# commit with same name, but different email 
>> 	# (different capitalization does the trick already, 
>> 	# but here I am going to use a different mail)
>> 	echo "asdf" > test2
>> 	git add test2
>> 	git commit -a --author="A <changed_email@example.org>" -m "add test2"
>>
>> 	# how do we know it's the same person?
>> 	git shortlog
>> 	A (2):
>> 		  add test1
>> 		  add test2
> 
> You don't, and it is a long known behaviour.
> 
>> 	# reports as expected:
>> 	git shortlog -sne
>> 		  1  A <A@example.org>
>> 		  1  A <changed_email@example.org>
> 
> Yes.
> 
>> 	# Adding the line to the mailmap should make life easy, so we know
>> 	# it's the same person
>> 	echo "A <A@example.org> <changed_email@example.org>" > .mailmap
>>
>> 	# Come on, I just wanted to have it reported as one person!
>> 	git shortlog -sne
>> 		 1  A <A@example.org>
>> 		 1  A <a@example.org>
> 
> Err, where does the lowercase a@ come from in the above?  Are we
> missing some steps before we get here?
> 
>> 	# So let's try another line in the mailmap file, (small 'a')
>> 	echo "A <a@example.org> <changed_email@example.org>" > .mailmap
> 
> This is ">", not ">>", I presume?  Otherwise changed_email is mapped
> to two destination, no?
> 
>> 	# We're not there yet?
>> 	git shortlog -sne
>> 		 1  A <A@example.org>
>> 		 1  A <a@example.org>
> 
> Expected, as long as some hidden set-up you did not describe that
> caused me to say "Err, where does the lowercase a@ come from" is
> there, i.e. one of the two commits is done by <a@example.org>.
> 
>> 	# Now let's write it rather explicit: 
>> 	# (essentially just write 2 lines into the mailmap file)
>> 	cat << EOF > .mailmap
>> 	A <a@example.org> <changed_email@example.org>
>> 	A <a@example.org> <A@example.org>
>> 	EOF
>> 		 
>> 	# works as expected now
>> 	git shortlog -sne
>> 		 2  A <a@example.org>
> 
> Makes sense.
> 
>> 	# works as expected now as well
>> 	git shortlog      
>> 	A (2):
>> 		  add test1
>> 		  add test2



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 899 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Bug in .mailmap handling?
  2013-07-12 16:07 Bug in .mailmap handling? Stefan Beller
  2013-07-12 17:35 ` Junio C Hamano
@ 2013-07-12 20:28 ` Junio C Hamano
  2013-07-12 20:35   ` Stefan Beller
  2013-07-12 20:38   ` Junio C Hamano
  1 sibling, 2 replies; 8+ messages in thread
From: Junio C Hamano @ 2013-07-12 20:28 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

Stefan Beller <stefanbeller@googlemail.com> writes:

> 	# Adding the line to the mailmap should make life easy, so we know
> 	# it's the same person
> 	echo "A <A@example.org> <changed_email@example.org>" > .mailmap

While I was looking at this, I noticed this piece of code:

diff --git a/mailmap.c b/mailmap.c
index 2a7b366..418081e 100644
--- a/mailmap.c
+++ b/mailmap.c
@@ -122,7 +122,7 @@ static char *parse_name_and_email(char *buffer, char **name,
 	while (nend > nstart && isspace(*nend))
 		--nend;
 
-	*name = (nstart < nend ? nstart : NULL);
+	*name = (nstart <= nend ? nstart : NULL);
 	*email = left+1;
 	*(nend+1) = '\0';
 	*right++ = '\0';

The function is given a buffer "A <A@example.org>...", nstart scans
from the beginning of the buffer, skipping whitespaces (there isn't
any, so nstart points at the buffer), while nend starts from one
byte before the first '<' and skips whitespaces backwards and ends
at the first non-whitespace (i.e. it hits "A" at the beginning of
the buffer).  nstart == nend in this case for a single-letter name,
and an off-by-one error makes it fail to pick up the name, which
makes the entry equivalent to

	<A@example.org> <changed_email@example.org>

without the name.  I do not think this bug affected anything you
observed, though.

> 	git shortlog -sne
> 		 1  A <A@example.org>
> 		 1  A <a@example.org>

This is coming from mailmap.c::add_mapping() that downcases the
e-mail address.

changed_email@example.org is mapped to a@example.org because of this
downcasing, while "A <A@example.org>" does not have any entry for it
in the .mailmap file, so it is given back as-is.  Hence we see two
distinct entries.

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: Bug in .mailmap handling?
  2013-07-12 20:28 ` Junio C Hamano
@ 2013-07-12 20:35   ` Stefan Beller
  2013-07-12 20:38   ` Junio C Hamano
  1 sibling, 0 replies; 8+ messages in thread
From: Stefan Beller @ 2013-07-12 20:35 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On 07/12/2013 10:28 PM, Junio C Hamano wrote:
> Stefan Beller <stefanbeller@googlemail.com> writes:
> 
>> 	# Adding the line to the mailmap should make life easy, so we know
>> 	# it's the same person
>> 	echo "A <A@example.org> <changed_email@example.org>" > .mailmap
> 
> While I was looking at this, I noticed this piece of code:
> 
> diff --git a/mailmap.c b/mailmap.c
> index 2a7b366..418081e 100644
> --- a/mailmap.c
> +++ b/mailmap.c
> @@ -122,7 +122,7 @@ static char *parse_name_and_email(char *buffer, char **name,
>  	while (nend > nstart && isspace(*nend))
>  		--nend;
>  
> -	*name = (nstart < nend ? nstart : NULL);
> +	*name = (nstart <= nend ? nstart : NULL);
>  	*email = left+1;
>  	*(nend+1) = '\0';
>  	*right++ = '\0';
> 
> The function is given a buffer "A <A@example.org>...", nstart scans
> from the beginning of the buffer, skipping whitespaces (there isn't
> any, so nstart points at the buffer), while nend starts from one
> byte before the first '<' and skips whitespaces backwards and ends
> at the first non-whitespace (i.e. it hits "A" at the beginning of
> the buffer).  nstart == nend in this case for a single-letter name,
> and an off-by-one error makes it fail to pick up the name, which
> makes the entry equivalent to
> 
> 	<A@example.org> <changed_email@example.org>
> 
> without the name.  I do not think this bug affected anything you
> observed, though.
> 
>> 	git shortlog -sne
>> 		 1  A <A@example.org>
>> 		 1  A <a@example.org>
> 
> This is coming from mailmap.c::add_mapping() that downcases the
> e-mail address.
> 
> changed_email@example.org is mapped to a@example.org because of this
> downcasing, while "A <A@example.org>" does not have any entry for it
> in the .mailmap file, so it is given back as-is.  Hence we see two
> distinct entries.
> 

So do I understand it right, we're having 2 bugs in here?

One being triggered by the short name, only one character.
So if you want to debug the other bug with a longer name,
you can either use a made up name, or run
    git shortlog -sne |grep Knut
in the git repository having the mailmap file already updated.
The way the mailmap file is written, I'd assume only one line
to be found, as of now 2 lines come up
     2	Knut Franke <knut.franke@gmx.de>
     1	Knut Franke <Knut.Franke@gmx.de>

which seems to downcase the whole first email.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Bug in .mailmap handling?
  2013-07-12 20:28 ` Junio C Hamano
  2013-07-12 20:35   ` Stefan Beller
@ 2013-07-12 20:38   ` Junio C Hamano
  2013-07-12 20:50     ` Junio C Hamano
  1 sibling, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2013-07-12 20:38 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

Junio C Hamano <gitster@pobox.com> writes:

> Stefan Beller <stefanbeller@googlemail.com> writes:
>
>> 	git shortlog -sne
>> 		 1  A <A@example.org>
>> 		 1  A <a@example.org>
>
> This is coming from mailmap.c::add_mapping() that downcases the
> e-mail address.
>
> changed_email@example.org is mapped to a@example.org because of this
> downcasing, while "A <A@example.org>" does not have any entry for it
> in the .mailmap file, so it is given back as-is.  Hence we see two
> distinct entries.

I think it is wrong for the parser to lose information by
downcasing.

It is perfectly fine to do the comparison case insensitively, to
allow <changed_Email@example.org> in the input is mangled using the
entry for <changed_email@example.org> in the .mailmap file; it is
not fine to downcase the parsed result, especially the side that is
used as the "rewritten result" (i.e. the tokens earlier on the line),
as that would mean mangling the output.

Let me see if I can quickly whip up a fix.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Bug in .mailmap handling?
  2013-07-12 20:38   ` Junio C Hamano
@ 2013-07-12 20:50     ` Junio C Hamano
  2013-07-12 21:13       ` [PATCH] Add a testcase for checking case insensitivity of mail map Stefan Beller
  0 siblings, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2013-07-12 20:50 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

Junio C Hamano <gitster@pobox.com> writes:

> Junio C Hamano <gitster@pobox.com> writes:
>
>> Stefan Beller <stefanbeller@googlemail.com> writes:
>>
>>> 	git shortlog -sne
>>> 		 1  A <A@example.org>
>>> 		 1  A <a@example.org>
>>
>> This is coming from mailmap.c::add_mapping() that downcases the
>> e-mail address.
>>
>> changed_email@example.org is mapped to a@example.org because of this
>> downcasing, while "A <A@example.org>" does not have any entry for it
>> in the .mailmap file, so it is given back as-is.  Hence we see two
>> distinct entries.
>
> I think it is wrong for the parser to lose information by
> downcasing.
>
> It is perfectly fine to do the comparison case insensitively, to
> allow <changed_Email@example.org> in the input is mangled using the
> entry for <changed_email@example.org> in the .mailmap file; it is
> not fine to downcase the parsed result, especially the side that is
> used as the "rewritten result" (i.e. the tokens earlier on the line),
> as that would mean mangling the output.
>
> Let me see if I can quickly whip up a fix.

It might be just the matter of doing this.

I suspect that we could drop the downcasing of old-email, too, but
the new-email is supposed to be the rewritten result that appears on
the output, and downcasing it is very wrong.

I also suspect that this was an old workaround for the original
string-list that did not know how to match items case insensitively,
which we should have removed when we added case sensitivity support
to the string-list at around 577f63e7 (Merge branch
'ap/log-mailmap', 2013-01-20).

 mailmap.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/mailmap.c b/mailmap.c
index 418081e..c64a53d 100644
--- a/mailmap.c
+++ b/mailmap.c
@@ -56,9 +56,6 @@ static void add_mapping(struct string_list *map,
 	if (old_email)
 		for (p = old_email; *p; p++)
 			*p = tolower(*p);
-	if (new_email)
-		for (p = new_email; *p; p++)
-			*p = tolower(*p);
 
 	if (old_email == NULL) {
 		old_email = new_email;

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH] Add a testcase for checking case insensitivity of mail map
  2013-07-12 20:50     ` Junio C Hamano
@ 2013-07-12 21:13       ` Stefan Beller
  0 siblings, 0 replies; 8+ messages in thread
From: Stefan Beller @ 2013-07-12 21:13 UTC (permalink / raw)
  To: git, Junio C Hamano; +Cc: Stefan Beller

Your patch seems to do it.
I added a test case which doesn't fail, if your patch is added.

Signed-off-by: Stefan Beller <stefanbeller@googlemail.com>

---
 t/t4203-mailmap.sh | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/t/t4203-mailmap.sh b/t/t4203-mailmap.sh
index 842b754..07152e9 100755
--- a/t/t4203-mailmap.sh
+++ b/t/t4203-mailmap.sh
@@ -409,4 +409,37 @@ test_expect_success 'Blame output (complex mapping)' '
 	test_cmp expect actual.fuzz
 '
 
+test_expect_success 'cleanup after mailmap.blob tests' '
+	rm -f .mailmap
+'
+
+cat >expect <<\EOF
+     2	A <A@example.org>
+     2	Other Author <other@author.xx>
+     2	Santa Claus <santa.claus@northpole.xx>
+     1	A U Thor <author@example.com>
+     1	CTO <cto@company.xx>
+     1	Some Dude <some@dude.xx>
+EOF
+
+test_expect_success 'Test case sensitivity of Names' '
+	# do a commit:
+	echo "asdf" > test1
+	git add test1
+	git commit -a --author="A <A@example.org>" -m "add test1"
+
+	# commit with same name, but different email
+	# (different capitalization does the trick already,
+	# but here I am going to use a different mail)
+	echo "asdf" > test2
+	git add test2
+	git commit -a --author="A <changed_email@example.org>" -m "add test2"
+
+	# Adding the line to the mailmap should make life easy, so we know
+	# it is the same person
+	echo "A <A@example.org> <changed_email@example.org>" > .mailmap
+
+	git shortlog -sne HEAD >actual && test_cmp expect actual
+'
+
 test_done
-- 
1.8.3.2.776.gfcf213d

^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2013-07-12 21:13 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-12 16:07 Bug in .mailmap handling? Stefan Beller
2013-07-12 17:35 ` Junio C Hamano
2013-07-12 17:47   ` Stefan Beller
2013-07-12 20:28 ` Junio C Hamano
2013-07-12 20:35   ` Stefan Beller
2013-07-12 20:38   ` Junio C Hamano
2013-07-12 20:50     ` Junio C Hamano
2013-07-12 21:13       ` [PATCH] Add a testcase for checking case insensitivity of mail map Stefan Beller

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).