git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH/RFC 0/2] post-receive-email: declaring and consistently using one output encoding
@ 2012-03-08 11:51 Jonathan Nieder
  2012-03-08 11:57 ` [PATCH 1/2] hooks/post-receive-email: set encoding to utf-8 Jonathan Nieder
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Jonathan Nieder @ 2012-03-08 11:51 UTC (permalink / raw)
  To: git
  Cc: Gerrit Pape, Alexey Shumkin, Johannes Sixt, Michael Haggerty,
	Jon Jensen

Hi,

These patches revisit the bug described at [1], where the sample
post-receive script annoys receiving MUAs by not declaring what
encoding it uses.  Worse, sometimes the mails have a mixture of
encodings.

These patches standardize on UTF-8, but that is only for the sake of
simplicity.  A patch on top to make the choice of encoding
customizable would probably not be too complicated, if someone is
interested.

Patches are targetted at 1.7.11 unless there is overwhelming
interest in them landing sooner.  Thanks to Alexander Gerasiov
<gq@cs.msu.su> for the writing a patch long ago to get this
started[1].

Thoughts?

Gerrit Pape (1):
  bug#506445: hooks/post-receive-email: set encoding to utf-8

Jonathan Nieder (1):
  post-receive-email: defend against non-utf8 [i18n] logoutputencoding
    setting

 contrib/hooks/post-receive-email |   15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

[1] http://thread.gmane.org/gmane.comp.version-control.git/181737/focus=181755
[2] http://bugs.debian.org/506445

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/2] hooks/post-receive-email: set encoding to utf-8
  2012-03-08 11:51 [PATCH/RFC 0/2] post-receive-email: declaring and consistently using one output encoding Jonathan Nieder
@ 2012-03-08 11:57 ` Jonathan Nieder
  2012-03-08 11:59 ` [PATCH 2/2] post-receive-email: defend against non UTF-8 i18n.logoutputencoding setting Jonathan Nieder
  2012-03-08 19:12 ` [PATCH/RFC 0/2] post-receive-email: declaring and consistently using one output encoding Junio C Hamano
  2 siblings, 0 replies; 9+ messages in thread
From: Jonathan Nieder @ 2012-03-08 11:57 UTC (permalink / raw)
  To: git
  Cc: Gerrit Pape, Alexey Shumkin, Johannes Sixt, Michael Haggerty,
	Jon Jensen

From: Gerrit Pape <pape@smarden.org>

"git log" generates logs in UTF-8 encoding by default, but the
post-receive-email example hook does not declare any encoding in
the emails it sends.  So add a line there:

+	Content-Type: text/plain; charset=utf-8

[jn: tweaked to also set the Content-Transfer-Encoding so MTAs know
what kind of mangling might be needed when sending to a non 8-bit
clean SMTP host]

Requested-by: Alexander Gerasiov <gq@cs.msu.su>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 contrib/hooks/post-receive-email |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/contrib/hooks/post-receive-email b/contrib/hooks/post-receive-email
index 01af9df1..dc184d0b 100755
--- a/contrib/hooks/post-receive-email
+++ b/contrib/hooks/post-receive-email
@@ -233,6 +233,9 @@ generate_email_header()
 	cat <<-EOF
 	To: $recipients
 	Subject: ${emailprefix}$projectdesc $refname_type $short_refname ${change_type}d. $describe
+	MIME-Version: 1.0
+	Content-Type: text/plain; charset=utf-8
+	Content-Transfer-Encoding: 8bit
 	X-Git-Refname: $refname
 	X-Git-Reftype: $refname_type
 	X-Git-Oldrev: $oldrev
-- 
1.7.9.2

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/2] post-receive-email: defend against non UTF-8 i18n.logoutputencoding setting
  2012-03-08 11:51 [PATCH/RFC 0/2] post-receive-email: declaring and consistently using one output encoding Jonathan Nieder
  2012-03-08 11:57 ` [PATCH 1/2] hooks/post-receive-email: set encoding to utf-8 Jonathan Nieder
@ 2012-03-08 11:59 ` Jonathan Nieder
  2012-03-08 13:50   ` Jeff King
  2012-03-14 19:36   ` Alexey Shumkin
  2012-03-08 19:12 ` [PATCH/RFC 0/2] post-receive-email: declaring and consistently using one output encoding Junio C Hamano
  2 siblings, 2 replies; 9+ messages in thread
From: Jonathan Nieder @ 2012-03-08 11:59 UTC (permalink / raw)
  To: git
  Cc: Gerrit Pape, Alexey Shumkin, Johannes Sixt, Michael Haggerty,
	Jon Jensen

Johannes Sixt wrote:
> Am 9/20/2011 12:42, schrieb Shumkin Alexey:

>> 1. post-send-mail uses description file of a repo
>> 2. gitweb also uses this file and AFAIK it assumes one to be in UTF-8
>>   (I do not know whether it can be changed there but I tested gitweb once long
>>     time ago)
>> 3. So if i18n.logoutputencoding is not UTF-8 we get a message composed
>> 	with mixed encodings. This fact oblidge us to encode headers
>> 	(as quoted printable at least) and synchronize body message that contain
>> 	repo description (in UTF-8) and diffstat (in i18n.logoutputencoding).
[...]
> In this case, it may make sense to have a separate setting, but you should
> call git like this:
>
>    git -c "i18n.logoutputencoding=$emailcharset" show ...
>    git -c "i18n.logoutputencoding=$emailcharset" rev-list --pretty ...

Something like this, I suppose?

This teaches post-receive-email to use plumbing where possible and to
explicitly declare what encoding it expects output to use.

Based on an advice from Alexey Shumkin and Johannes Sixt, but all bugs
are mine.  Making the email charset configurable is left as an
exercise for the interested reader.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 contrib/hooks/post-receive-email |   12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/contrib/hooks/post-receive-email b/contrib/hooks/post-receive-email
index dc184d0b..b59e03cd 100755
--- a/contrib/hooks/post-receive-email
+++ b/contrib/hooks/post-receive-email
@@ -464,7 +464,7 @@ generate_delete_branch_email()
 	echo "       was  $oldrev"
 	echo ""
 	echo $LOGBEGIN
-	git show -s --pretty=oneline $oldrev
+	git diff-tree --encoding=UTF-8 -s --pretty=oneline $oldrev
 	echo $LOGEND
 }
 
@@ -540,11 +540,11 @@ generate_atag_email()
 		# performed on them
 		if [ -n "$prevtag" ]; then
 			# Show changes since the previous release
-			git rev-list --pretty=short "$prevtag..$newrev" | git shortlog
+			git shortlog --encoding=UTF-8 "$prevtag..$newrev"
 		else
 			# No previous tag, show all the changes since time
 			# began
-			git rev-list --pretty=short $newrev | git shortlog
+			git shortlog --encoding=UTF-8 "$newrev"
 		fi
 		;;
 	*)
@@ -564,7 +564,7 @@ generate_delete_atag_email()
 	echo "       was  $oldrev"
 	echo ""
 	echo $LOGBEGIN
-	git show -s --pretty=oneline $oldrev
+	git diff-tree --encoding=UTF-8 -s --pretty=oneline $oldrev
 	echo $LOGEND
 }
 
@@ -610,7 +610,7 @@ generate_general_email()
 	echo ""
 	if [ "$newrev_type" = "commit" ]; then
 		echo $LOGBEGIN
-		git show --no-color --root -s --pretty=medium $newrev
+		git diff-tree --encoding=UTF-8 --root -s --pretty=oneline $newrev
 		echo $LOGEND
 	else
 		# What can we do here?  The tag marks an object that is not
@@ -629,7 +629,7 @@ generate_delete_general_email()
 	echo "       was  $oldrev"
 	echo ""
 	echo $LOGBEGIN
-	git show -s --pretty=oneline $oldrev
+	git diff-tree --encoding=UTF-8 -s --pretty=oneline $oldrev
 	echo $LOGEND
 }
 
-- 
1.7.9.2

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] post-receive-email: defend against non UTF-8 i18n.logoutputencoding setting
  2012-03-08 11:59 ` [PATCH 2/2] post-receive-email: defend against non UTF-8 i18n.logoutputencoding setting Jonathan Nieder
@ 2012-03-08 13:50   ` Jeff King
  2012-03-08 13:52     ` Jonathan Nieder
  2012-03-14 19:36   ` Alexey Shumkin
  1 sibling, 1 reply; 9+ messages in thread
From: Jeff King @ 2012-03-08 13:50 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: git, Gerrit Pape, Alexey Shumkin, Johannes Sixt, Michael Haggerty,
	Jon Jensen

On Thu, Mar 08, 2012 at 05:59:57AM -0600, Jonathan Nieder wrote:

> diff --git a/contrib/hooks/post-receive-email b/contrib/hooks/post-receive-email
> index dc184d0b..b59e03cd 100755
> --- a/contrib/hooks/post-receive-email
> +++ b/contrib/hooks/post-receive-email
> @@ -464,7 +464,7 @@ generate_delete_branch_email()
>  	echo "       was  $oldrev"
>  	echo ""
>  	echo $LOGBEGIN
> -	git show -s --pretty=oneline $oldrev
> +	git diff-tree --encoding=UTF-8 -s --pretty=oneline $oldrev
>  	echo $LOGEND

If you are using "-s" to suppress diff output, why are you using
diff-tree? Wouldn't "rev-list -1" (or "rev-list --no-walk") work equally
well and be a little more obvious?

-Peff

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] post-receive-email: defend against non UTF-8 i18n.logoutputencoding setting
  2012-03-08 13:50   ` Jeff King
@ 2012-03-08 13:52     ` Jonathan Nieder
  0 siblings, 0 replies; 9+ messages in thread
From: Jonathan Nieder @ 2012-03-08 13:52 UTC (permalink / raw)
  To: Jeff King
  Cc: git, Gerrit Pape, Alexey Shumkin, Johannes Sixt, Michael Haggerty,
	Jon Jensen

Jeff King wrote:

> If you are using "-s" to suppress diff output, why are you using
> diff-tree? Wouldn't "rev-list -1" (or "rev-list --no-walk") work equally
> well and be a little more obvious?

Just a habit --- some part of me is wired to think "diff-tree is the
command to show a commit".  Maybe rev-list didn't support --pretty in
olden days?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH/RFC 0/2] post-receive-email: declaring and consistently using one output encoding
  2012-03-08 11:51 [PATCH/RFC 0/2] post-receive-email: declaring and consistently using one output encoding Jonathan Nieder
  2012-03-08 11:57 ` [PATCH 1/2] hooks/post-receive-email: set encoding to utf-8 Jonathan Nieder
  2012-03-08 11:59 ` [PATCH 2/2] post-receive-email: defend against non UTF-8 i18n.logoutputencoding setting Jonathan Nieder
@ 2012-03-08 19:12 ` Junio C Hamano
  2012-03-14 17:54   ` Jon Jensen
  2 siblings, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2012-03-08 19:12 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: git, Gerrit Pape, Alexey Shumkin, Johannes Sixt, Michael Haggerty,
	Jon Jensen

Jonathan Nieder <jrnieder@gmail.com> writes:

> Patches are targetted at 1.7.11 unless there is overwhelming
> interest in them landing sooner.

Even if there is none, I wouldn't mind taking it for 1.7.10, if only
to reduce the chance that the patch gets forgotten again, provided
that there is a clear concensus that this is a good change that won't
regress people who have been actually using this sample in real life.

> <gq@cs.msu.su> for the writing a patch long ago to get this
> started[1].
>
> Thoughts?
>
> Gerrit Pape (1):
>   bug#506445: hooks/post-receive-email: set encoding to utf-8
>
> Jonathan Nieder (1):
>   post-receive-email: defend against non-utf8 [i18n] logoutputencoding
>     setting
>
>  contrib/hooks/post-receive-email |   15 +++++++++------
>  1 file changed, 9 insertions(+), 6 deletions(-)
>
> [1] http://thread.gmane.org/gmane.comp.version-control.git/181737/focus=181755
> [2] http://bugs.debian.org/506445

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH/RFC 0/2] post-receive-email: declaring and consistently using one output encoding
  2012-03-08 19:12 ` [PATCH/RFC 0/2] post-receive-email: declaring and consistently using one output encoding Junio C Hamano
@ 2012-03-14 17:54   ` Jon Jensen
  0 siblings, 0 replies; 9+ messages in thread
From: Jon Jensen @ 2012-03-14 17:54 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jonathan Nieder, git, Gerrit Pape, Alexey Shumkin, Johannes Sixt,
	Michael Haggerty

On Thu, 8 Mar 2012, Junio C Hamano wrote:

>> Patches are targetted at 1.7.11 unless there is overwhelming interest 
>> in them landing sooner.
>
> Even if there is none, I wouldn't mind taking it for 1.7.10, if only to 
> reduce the chance that the patch gets forgotten again, provided that 
> there is a clear concensus that this is a good change that won't regress 
> people who have been actually using this sample in real life.

+1 from me. This would solve an occasional problem we have because 
encodings aren't set in post-receive-email output.

Jon

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] post-receive-email: defend against non UTF-8 i18n.logoutputencoding setting
  2012-03-08 11:59 ` [PATCH 2/2] post-receive-email: defend against non UTF-8 i18n.logoutputencoding setting Jonathan Nieder
  2012-03-08 13:50   ` Jeff King
@ 2012-03-14 19:36   ` Alexey Shumkin
  2012-03-14 19:55     ` Jonathan Nieder
  1 sibling, 1 reply; 9+ messages in thread
From: Alexey Shumkin @ 2012-03-14 19:36 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: git, Gerrit Pape, Johannes Sixt, Michael Haggerty, Jon Jensen

I'd like to remind then following aspect
> My proposition was in to send email message in explicitly defined
> custom encoding. Why? In development process under Windows non-UTF-8
> encoding is used (cp1251 in my case). So, filenames have this
> encoding, and as we know Git stores their names as is - in cp1251 -
> without a conversion. And filenames are also used in diff-stat (with
> core.quotepath= false, BTW, I did not take into account this config)
> without any conversion. So if we'll make all text in UTF-8 but
> filenames are still non-UTF-8, email would look corrupted.
this is citation from 
http://thread.gmane.org/gmane.comp.version-control.git/181737/focus=183076

> Johannes Sixt wrote:
> > Am 9/20/2011 12:42, schrieb Shumkin Alexey:
> 
> >> 1. post-send-mail uses description file of a repo
> >> 2. gitweb also uses this file and AFAIK it assumes one to be in
> >> UTF-8 (I do not know whether it can be changed there but I tested
> >> gitweb once long time ago)
> >> 3. So if i18n.logoutputencoding is not UTF-8 we get a message
> >> composed with mixed encodings. This fact oblidge us to encode
> >> headers (as quoted printable at least) and synchronize body
> >> message that contain repo description (in UTF-8) and diffstat (in
> >> i18n.logoutputencoding).
> [...]
> > In this case, it may make sense to have a separate setting, but you
> > should call git like this:
> >
> >    git -c "i18n.logoutputencoding=$emailcharset" show ...
> >    git -c "i18n.logoutputencoding=$emailcharset" rev-list
> > --pretty ...
> 
> Something like this, I suppose?
> 
> This teaches post-receive-email to use plumbing where possible and to
> explicitly declare what encoding it expects output to use.
> 
> Based on an advice from Alexey Shumkin and Johannes Sixt, but all bugs
> are mine.  Making the email charset configurable is left as an
> exercise for the interested reader.
> 
> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
> ---
>  contrib/hooks/post-receive-email |   12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/contrib/hooks/post-receive-email
> b/contrib/hooks/post-receive-email index dc184d0b..b59e03cd 100755
> --- a/contrib/hooks/post-receive-email
> +++ b/contrib/hooks/post-receive-email
> @@ -464,7 +464,7 @@ generate_delete_branch_email()
>  	echo "       was  $oldrev"
>  	echo ""
>  	echo $LOGBEGIN
> -	git show -s --pretty=oneline $oldrev
> +	git diff-tree --encoding=UTF-8 -s --pretty=oneline $oldrev
>  	echo $LOGEND
>  }
>  
> @@ -540,11 +540,11 @@ generate_atag_email()
>  		# performed on them
>  		if [ -n "$prevtag" ]; then
>  			# Show changes since the previous release
> -			git rev-list --pretty=short
> "$prevtag..$newrev" | git shortlog
> +			git shortlog --encoding=UTF-8
> "$prevtag..$newrev" else
>  			# No previous tag, show all the changes
> since time # began
> -			git rev-list --pretty=short $newrev | git
> shortlog
> +			git shortlog --encoding=UTF-8 "$newrev"
>  		fi
>  		;;
>  	*)
> @@ -564,7 +564,7 @@ generate_delete_atag_email()
>  	echo "       was  $oldrev"
>  	echo ""
>  	echo $LOGBEGIN
> -	git show -s --pretty=oneline $oldrev
> +	git diff-tree --encoding=UTF-8 -s --pretty=oneline $oldrev
>  	echo $LOGEND
>  }
>  
> @@ -610,7 +610,7 @@ generate_general_email()
>  	echo ""
>  	if [ "$newrev_type" = "commit" ]; then
>  		echo $LOGBEGIN
> -		git show --no-color --root -s --pretty=medium $newrev
> +		git diff-tree --encoding=UTF-8 --root -s
> --pretty=oneline $newrev echo $LOGEND
>  	else
>  		# What can we do here?  The tag marks an object that
> is not @@ -629,7 +629,7 @@ generate_delete_general_email()
>  	echo "       was  $oldrev"
>  	echo ""
>  	echo $LOGBEGIN
> -	git show -s --pretty=oneline $oldrev
> +	git diff-tree --encoding=UTF-8 -s --pretty=oneline $oldrev
>  	echo $LOGEND
>  }
>  

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] post-receive-email: defend against non UTF-8 i18n.logoutputencoding setting
  2012-03-14 19:36   ` Alexey Shumkin
@ 2012-03-14 19:55     ` Jonathan Nieder
  0 siblings, 0 replies; 9+ messages in thread
From: Jonathan Nieder @ 2012-03-14 19:55 UTC (permalink / raw)
  To: Alexey Shumkin
  Cc: git, Gerrit Pape, Johannes Sixt, Michael Haggerty, Jon Jensen

Alexey Shumkin wrote:

> I'd like to remind then following aspect
>
>> My proposition was in to send email message in explicitly defined
>> custom encoding. Why? In development process under Windows non-UTF-8
>> encoding is used (cp1251 in my case). So, filenames have this
>> encoding, and as we know Git stores their names as is
[...]
>>                                                                 with
>> core.quotepath= false

Sure.  Do you think this patch makes that problem worse, and if so, do
you have any ideas about how that could be prevented?  Otherwise:

>>            Making the email charset configurable is left as an
>> exercise for the interested reader.

I did not want to do that part because I do not trust myself to
understand the needs of people using non-utf8 and test it
appropriately, but I tried to make sure the patch was structured in a
way that would make it easy.

Hoping clarifies a little,
Jonathan

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2012-03-14 19:55 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-08 11:51 [PATCH/RFC 0/2] post-receive-email: declaring and consistently using one output encoding Jonathan Nieder
2012-03-08 11:57 ` [PATCH 1/2] hooks/post-receive-email: set encoding to utf-8 Jonathan Nieder
2012-03-08 11:59 ` [PATCH 2/2] post-receive-email: defend against non UTF-8 i18n.logoutputencoding setting Jonathan Nieder
2012-03-08 13:50   ` Jeff King
2012-03-08 13:52     ` Jonathan Nieder
2012-03-14 19:36   ` Alexey Shumkin
2012-03-14 19:55     ` Jonathan Nieder
2012-03-08 19:12 ` [PATCH/RFC 0/2] post-receive-email: declaring and consistently using one output encoding Junio C Hamano
2012-03-14 17:54   ` Jon Jensen

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).